Approaches for linking form instance changes to individuals

martijnr · June 12, 2019, 10:03pm

The control for a 'comment' (called discrepancy note in OpenClinica) is defined as in the XLSForm/XForm as I posted above (so just the for attribute and an appearance). The RFC functionality is built on top of that discrepancy note question and has no associated XForm syntax (it's defined by the view that the backend UI launches for that user) but it shares the data structure (stringified JSON for OpenClinica) of the discrepancy note and is shown in its history (which the user can view within the form). RFC is always required for them.

Not sure how helpful that all is though, because that was designed long before any audit functionality was added to the spec, and I'm not advocating for it. The only advantage of their approach is that you can query that comment data (e.g. it has different statuses and they have a custom comment-status() XPath function that can inspect the status of that JSON data). This means they can use it inside constraint and required expressions in XPath. E.g. a question can be required only if it doesn't have a comment or a constraint can include the clause that a value can exceed a limit if it has an 'updated' or 'new' comment.

Though the use by OpenClinica may be useful to see how far clinical trial requirements could go (I think it's way beyond what a generic client such as Enketo or ODK Collect should handle, for sure). Hence none of that specialized stuff has made into the core Enketo. We just made the core Enketo very extensible to facilitate such domain-specific customizations.

LN · June 12, 2019, 10:26pm

As I understand it, the workflow that @dr_michaelmarks and @chrissyhroberts have described is entirely offline and so there is no entity to authenticate against. The idea would be that someone fills out data on their device and then either reviews it and makes edits later or hands it to someone else for review and editing. It's the same as the paper case where initials have to be relied on. This would be a bit better because you could at least know exactly whose device submitted the data.

In an ideal world, I think Enketo could be used for online edits to extend that workflow. I'm imagining it could have a way to use server auth for this feature. For example, when a user is logged in to Central and launches a submission for edit, Central would pass on some kind of client token/hash that would automatically be used as the user identifier for this feature. I'd see that as a future extension on this spec -- something like if the session (virtual secondary instance) has a user identifier, use that.

You mean for all questions, right? My evolving sense of option 2 is that it would be something like a single audit attribute (e.g. odk:track-change-reasons) that lets the client know to prompt for user identifier and change reason every single time a field is changed from a blank value to a non-blank value. The audit log would get two new columns (e.g. editor-id and change-reason) that would get populated. Editing or saving of the form would be blocked until editor identification and change reason were populated.

Basically this.

That would be very convenient and can be client-specific.

Xiphware · June 12, 2019, 10:50pm

Will there be daycare services at the convening?...

aurdipas · June 13, 2019, 7:09am

Well, first time we could use the login information (username), then if someone edit has to enter his/her user before editing. (Of course no authentication, all offline as paper based).
A bit more of restriction can be added validating the user editing against a csv media file with the list of users "authorized" for editing (?).

Yes. And I like your approach

paul_macharia · June 13, 2019, 9:10am

Great progress people!

It's all about enhancing electronic data collection by making is verifiable, have an effective audit trails thus credible/authentic data.

@yanokwa, see how to make a signature feature work. In my view, it could be the most tamper-proof.

Paul

dr_michaelmarks · June 13, 2019, 10:31pm

Just a minor point but we per se don't need an actual "signature" - an alphanumeric identifier would be fine (that's what Redcap records for example) and the file size of saving a JPG, or in the case of an audit trail multiple JPGs each time a change is made, would be problematic on many places we use ODK.

Grzesiek2010 · June 14, 2019, 12:28pm

The second option seems like the best one to me as well but I'm not sure if I understand everything...
We have many comments here but no one approach (most of them come from the option 2)
.
We should ask for the user id every time a form is started//edited and write it to audit.csv file and that seems fine, but what about those comments... I saw the sample @LN attached https://docs.google.com/spreadsheets/d/1DqRHPKxrB1qZPp8rQtwky-Lm4WASTK33iVUQEKSJXtw/edit#gid=0

It assumes that a user should also add a comment just when the form is opened and it's a general comment. Does that make sense to ask for such a general comment before editing anything? I think a user might don't know what he is going to change at that point.
If we want to have a general comment for the entire form it should be displayed when the form is being exited.

Asking for a comment, for each edited question (if a question requires it) seems fine but in that case it should ask for a comment, only not user id becasue it doesn't make sense to duplicate it right?

I think we can implement it in two separate pr (two features):
First of all we should add an option to ask for the user id and a generall comment.
Then we can add an option to ask for comments for each separate questions.

dr_michaelmarks · June 18, 2019, 10:23pm

Just to add (although we have incentive enough as is) that we are getting fairly serious interest from major players in other serious epidemic diseases to use ODK if we can get the GCP compliance issues sorted - of which this audit trail thing is the key step.

Xiphware · June 19, 2019, 2:13am

FYI I think this would be inadequate; I can certainly see any change to a response - eg from "Fail" to "Pass" - as requiring both a timestamp and an optional (or mandatory?) comment. And in the general case potentially a signature too (@dr_michaelmarks?)

I think a fully general-purpose design may well need to fulfill:

allow for comment for every change (vs a single comment at beginning/end, or comment only for null-to-nonnull). And we probably want something to flag whether a comment is mandatory or not.
timestamp every change
allow for 'signature' (or otherwise initial) for every change (vs just a single signature at beginning). And likewise flag if mandatory or not.

Question is, we can we scope it back initially? eg only require signature at beginning (which might determine whether this is 1 PR or more...)

I think we can implement it in two separate pr (two features):
First of all we should add an option to ask for the user id and a generall comment.
Then we can add an option to ask for comments for each separate questions.

yanokwa · June 24, 2019, 11:16pm

@LN and I spent some time iterating on a proposal built around Option 2 and we've published it at https://docs.google.com/document/d/1NyBFhASCOAqOSmz70diQObKpymVE59_MuwjOUPCKlR8.

We've put it in a Google Doc so you can easily leave comments. We'd love your comments or objections over the next 5 days. @TAB please also review!

Xiphware · June 27, 2019, 10:39pm

Some thoughts/straw man (google doc comments getting rather long...)

when first start up or reopen a saved form, the user has to pseudo-'login' by providing their name/initial in a popup (if odk:track-user set)
if odk:track-change-reason (per form or per question?) is set then any user change to an existing non-null value - including initially non-null defaults in the instance XML, or going back and changing a response - brings up mandatory popup to enter reason (this would have to occur after constraint checking). By implication, this change is tagged against the current 'logged in' user, since otherwise Collect has no notion of internal sessions. Note, this (also) occurs when first filling in the form [why? 'cause I dont see a compelling reason to introduce and track a distinct different state to distinguish between before and after saved form state... but perhaps that's open to debate]
best practice is that you must 'save' forms before handing the device to another person, who will then have to 'login' using their id/initial should they reopen the form and edit it. Realistically I dont think there's a lot we can do to enforce a re-login when you hand the device to someone else, so it'll just have to be best-practice...
any and all changes timestamped, with the initial answering of a question (ie null to non-null) suitably tagged as such, since these wont have a change reason (as described above)

LN · July 12, 2019, 10:33pm

Thanks so much for all the thoughtful responses, everyone. @yanokwa caught me up on the TSC discussions.

The spec document has been edited to reflect the various feedback received. It's very similar to @Xiphware's strawman above.

You can reach the diff from the last version from File > Version History.

Some things to note:

asking for user ID and comments have been fully disaggregated as @Grzesiek2010, @Xiphware and others recommended. They are also completely independent from change tracking. This enables various scenarios such as auditing the identity of users navigating the form without including potentially sensitive form data in the log or using reasons for change as "notes to self" in a one-person data collection context.
the ODK XForms spec is deliberately agnostic to what the user identifier is and how it is obtained.

yanokwa · August 12, 2019, 3:49am

We've gotten feedback on the spec from the research facilitation team at LSHTM and the takeaway is..

The implementation we've specified, paired with appropriate procedures, could be made compliant with good clinical practice (and thus pass internal review boards). LSHTM will put together some procedures about how ODK could be used in a way that achieves all of this, but that shouldn't be a blocker.

All this to say, I think this should settle @TSC-1's concerns. I would request that TSC members (especially @aurdipas, @Xiphware, @martijnr) please review the updated specification at https://docs.google.com/document/d/1NyBFhASCOAqOSmz70diQObKpymVE59_MuwjOUPCKlR8 and leave comments. It'd be great to get approval on this in the next two weeks!

Xiphware · August 12, 2019, 4:22am

Or slightly less... I'll put it on the agenda for next TSC call to discuss and (hopefully) approve!

Xiphware · September 2, 2019, 10:00pm

Some additional thoughts after last week's TSC discussion...

One of the issues raised was around when to start tracking these changes; specifically you may not want to have to enter change reasons when initially filling in a form - which will mostly be the "blank to non-blank" scenario. Instead, you mostly want to track changes after initially filling in the form, and passing it off to someone else, or re-visiting it at a later time, or after a 'save' operation [although strictly speaking what constitutes 'saving' a form isnt particularly well defined in the spec...]

But then perhaps some questions are prefilled with defaults, in which case perhaps you do in fact want to track these change in the initial pass? @tomsmyth suggested perhaps we therefore need additional flags to indicate when this feature should kick in: "always", "after-save", or "after-finalize", etc...

My thought was, perhaps for an initial pass of this feature, to perhaps simply if odk:track-changes-reason = “true” then whenever said form is opened - either first time or re-opened - to popup prompt like "Do you wish to track all changes?" and if so have them enter their userid (to be used to tag these changes in the audit log). Although this wont enforce track changes - but instead defers the decision to the user - without a formal 'login'/authorization procedure there isnt a particularly reliable means to determine that a form has actually been passed on to someone else (hence need to start logging their changes) that doesn't effectively rely on the user voluntarily specifying it!

[aside, this would also handle the case when there is no save/validate/finalize step, as in my particular situation]

LN · September 3, 2019, 3:47am

I'm not strongly opposed to this but I do think it adds undesirable overhead to the user experience. I do feel pretty strongly that if we go this direction, the order should be flipped. That is, the user identifier should always be requested on open no matter what. Then the answer to this question can also be logged and associated with the user who made the decision.

I still think that always tracking changes from non-blank values (including defaults) and never tracking changes from blank values (including values that were set, cleared, then set again), is easy to explain and sufficient but I certainly could be convinced otherwise.

I commented on the doc but it may be easier to respond to different parts of my reasoning here.

Perhaps it would be slightly nicer to have a "pristine state" concept as @ggalmazor alluded to. That way a change from non-pristine blank to non-blank would get a reason for change. But I really do think this situation is rare. The added complexity of keeping a pristine flag for each field seems too high for the benefit it brings.

For non-pristine blank to non-blank to happen, the user would have to blank out a value, navigate away from the field (e.g. by tabbing in Enketo or by swiping in Collect) and then come back to the field to write a new value. This is different from selecting a field with a value in it and changing its value without navigating away (this would be tracked).

The paper equivalent would be crossing out a previously-entered value, doing other things, and then entering a value again. Generally I think that would be given a single explanation. For those who have used such protocols on paper, what do you think?

There is long-press to clear in Collect and the possibility of clearing certain question types by clicking a trash icon in Enketo. In those cases, I think it's reasonable to expect that the reason for change when the value is blanked would be an explanation for why the value is being changed to a new one. I think it would be strange for the user to be prompted again when setting the value.

For those of you who think this is common, could you say more about when it happens?

Xiphware · September 3, 2019, 5:31am

Some of this may depend on when the particular widget in question 'loses focus' and initiates a refresh. For example, with a slider widget, pretty much every time you move the slider it can (should? and does in my case...) update the instance XML and generate a re-calc cycle, even though the user is still techncially 'focued' on that widget/question (you dont really know when the user has stopped moving the slider...they may pause and wait to see what happens). Whereas, primarily for efficiency, you dont really want to fire a re-calc every time you are typing in text, so you probably dont want to update the instance XML/re-calc till they've explicitly "navigated away" from your text question. More specifically, with a text entry widget, by virtue of how its implemented, yes you can delete the existing value entirely (non-blank to blank) and enter something else (blank to non-blank), and from the outside its all just a single operation. But potentially for other widgets - eg those that update continuously - it may be more problematic to determine when the user is 'done' (by which to determine a value change vs blank to non-blank vs non-blank to blank...). It feels like this behavior needs to be intimately (and explicitly?) tied to when precisely the actual instance XML gets updated; ie xforms-value-changed event.

One of the (seemingly?) unresolved questions is WHEN exactly is this audit change logged. It seems like this has to be scoped within WHEN exactly (and under what exact circumstances) does a xforms-value-changed event take place? I'm not trying to assert anything one way or the other, but it does feels like that is the case... I'd appreciate other's thoughts!

Xiphware · September 3, 2019, 8:21am

That sounds reasonable. If (and that's a big IF which still needs to be decided upon...) we ultimately - and effectively - defer turning audit change tracking On/Off to the user, then requiring them to enter their id whenever odk:track-changes-reason is enabled seems appropriate, simply to be able to accurately log them disabling it.

LN · September 3, 2019, 7:37pm

Thanks so much to all who have contributed to improving this spec. @yanokwa has invited me to be on the TSC call tomorrow. I can be there about 15 minutes in.

I took a moment to listen to the last call. I should have done that before responding. Luckily what I wrote in my last comment is still relevant but I realize it may need more context. First, thanks to @aurdipas for bringing up the question about the non-blank to blank to non-blank case and to @adam.butler for the explanation of the intended behavior in the current spec draft.

Why not ask for change reasons only on form re-open

An earlier draft of the spec proposed asking for change reasons on form re-open but as mentioned by @tomsmyth in the call, it’s common to partially fill a form and to complete it on re-open. It doesn’t make sense to ask for change reasons then.

Also, asking for change reasons any time a non-blank value is changed better matches paper. For example, if I write down a child’s age as 12 and then I need to cross it out to write 13, I would need to initial and explain this because it was written in ink. This is the case even if I immediately catch the mistake and make the fix.

Possible strategies to overcome issues with asking for change reasons only on form re-open

The big downside of letting the form designer choose when to ask for change reasons (always, after save, after finalize…) as @tomsmyth suggested or leaving it up to the enumerator as @Xiphware described is that both require more training and leave more room for enumerator error. For example, if the form designer can choose when change reasons are requested, enumerators have to be trained on when to save vs. when to finalize and have to be given instructions on what to do if they do the wrong one.

In the case of the enumerator deciding, another downside is that it leaves the door open for both accidental and deliberate opt-out.

Why tracking pristine status doesn’t seem worth it

On the call, @martijnr described requesting change reasons when non-blank values are changed as a crude stand-in for requesting change reasons when a previously-set value is changed. That’s correct, though it’s not clear to me that requesting reasons for changes to previously-set value is much better.

Tracking whether each field has ever been set (“pristine status”) is possible. Then explanations for changes could be requested when changes are made to non-pristine values regardless of whether the new value is blank or not.

I believe this would lead to the following two differences when compared to what is currently written in the spec:

Changes to default values would not require explanations because they are pristine
In a case where the user blanks a value and then sets it again, explanations would be requested both when blanking the value and when it is set again (even if immediately after).

I’m not entirely sure that those are improvements. In the default case, I can see it either way. In the blanked to non-blank case, requesting two reasons (one for clearing the value and one for re-entering a value) seems like it would generally be redundant.

Tracking pristine status would add complexity and possibly have performance implications. I think it would need to provide clear user benefit for it to be worthwhile.

Am I underestimating the importance of the blanked to non-blank case?

See my previous post for more on why I think the blanked to non-blank case is uncommon and above for why I think suggested ways to address it are not much better or have other downsides. But am I underestimating its importance and/or overblowing the challenges with the alternatives? @aurdipas, is it possible you reacted really strongly to this because the wording in the limitation section made it sound like no change reason was asked for at all in that case?

When to log events when there are multiple questions on a screen

As @Xiphware has pointed out in his recent post and @martijnr highlighted in the doc, it’s not as clear when question events should be logged when there are several questions on a page and especially when some question types can be updated continuously.

This affects all audit features and not this one specifically. Currently, Collect defines the start and end times of question events from within field lists as the times when the field list is entered and exited, respectively. This definitely needs to be improved, but I don’t see a reason for it to block this spec.

I propose we make separate decisions on this, possibly for each question type separately, as other clients get ready to implement the audit log features.

Study design standards

On the call, @martijnr also asked about study design standards. See bullet 2 in the original feature description for an example. One thing I’ve understood from @dr_michaelmarks (correct me if I’m getting this wrong) is that protocols and training are more important than tech.

That is, the technology has to make it possible to collect things like user identifiers and reasons for change but beyond that, a robust protocol with things like independent oversight go a long way in determining whether an approach is standards compliant. It’s possible to design a bad protocol with great tech or to design a great protocol with limited tech (e.g. paper).

dr_michaelmarks · September 3, 2019, 7:52pm

100% correct @LN

In reality many minor changes don't have an explanation given either; i.e if I cross out age 8 and put 9 (or even say 20 and put 40) I might well just initial & date-time.

That is to say that ideally there needs to be an option to record changes but its not compulsory to do so (but it is compulsory to date & identify who made the change)