Approaches for linking form instance changes to individuals

Option 2 is nice because it's going into the audit log where we've started to put these things. Also, it's relatively fast to do.

I like it because you can imagine getting this dialog at the question level instead of the form level and that'd better satisfy the requirement. That workflow would be something like this:

  • On first launch, you enter the username/id. Maybe we auto-fill this if you have the username question.
  • On second launch, if you change a question, you confirm username and enter a short reason for change. Timestamp and the change itself are already in the log and the dialog keeps the previous entered information to speed up entry.

Yes, this narrows the type of data you can collect, but:

  • Systems with CRF support get this same narrow set of data and everyone is fine with it.
  • We can't even use fingerprints in Collect outside the audit. There isn't widely available hardware so why let this be a blocker?
  • Signatures are pretty large images. This could be fixed, I suppose.

To me, the big negative is that if we go down this road, Enketo won't have this feature, but Enketo (and thus Central at some point) could support it with relatively little work. Of course, we have to convince @martijnr, but he's less fanatical than he's ever been :laughing:!

2 Likes

This is an important point, which may preclude certain options (eg I think it could be difficult to accomplish using a parallel form/secondary instance/per XML element metadata). If each and every change must logged - even undoing an edit and setting a field's value back to its original during the same session, this rather implies a continuous change event log, ie audit lot.

I'm starting to warm up more to a specialized feature linked to the audit log (option 2) with all the arguments made here. It sounds like "for each change made on form re-entry, gather an alphanumeric user identifier and free text comment" is a very common requirement and narrowly addressing that case in a way that is easy for form designers would provide a lot of value. We could write a spec that describes what pieces of information need to be gathered from the user, when the user is required to enter them, and where in the audit log they get written to. Clients could choose an appropriate presentation.

If we had existing full-featured XForms engines to work with, there would be lots of interesting options but we don't and the XForms-y solutions I've come up with are either a ton of work for little marginal user benefit, not a great user experience, a bit of a hack, or a combination of the former.

Agreed the specs for for and the comment appearance look good. When those comments are used to track reasons for change, is there something that forces the comments to be filled in? Is user identity tracked based on who is logged in to a server?

I do think that for our users, separating the form data that and the audit metadata is valuable. It's not always the same people who analyze the two and working with XML is generally a higher technical bar. All that to say, if something like the for bind attribute were used for this purpose, I think the servers would have to do additional steps to extract the audit information. Given server fragmentation, I would say that's less than ideal.

Similarly, regarding the form design side, @martijnr wrote

That doesn't feel particularly more flexible than a single attribute that magically requires comments on all fields.

What I'm most nervous about are follow-on requirements like coded (rather than free-form) reasons for change.

@Xiphware asked "If each and every change must logged - even undoing an edit and setting a field's value back to its original during the same session"

Several levels of answer here.

  1. the audit function @LN @yanokwa built does this already (for example answer question, swipe forward, swipe back change answer).

This is actually a level of detail BEYOND what redcap actually does.
On the other hand it is an accurate reflection of what should ideally happen (i.e Redcap misses some timepoints where ideally you would have audit data).

  1. The key here therefore is not tracking the changes (which the newly done audit log does) or when they were made (again already done) but the WHY and the WHO

  2. In general what the main user wants to export is the FINAL data values.
    The audit log is so one can go back if requested and see what changes etc but is not routinely looked at per se as part of analysis

1 Like

second feedback:

"Option 2 looks most promising. In addition to that, for clinical trials I would like to add that:

  • It is absolutely key that there is user verification with password when entering data (whether initial entry or changing data). Otherwise you cannot trust the name that is assigned to the action in the audit trail. The audit trail is used in clinical trials to prove no one had access to the data that should not have had access to the data, that only trained and qualified personnel has entered data, to check if data has been entered according to a logical time frame (done to detect errors and/or fraud).

  • It would be ideal if you can set reason for change as mandatory at a question level when building the CRF. Because if it is optional, you will still need to check if reasons for change are being given, which leads to extra work. "

I could see easily standard responses (aka select one) being/becoming a high priority fairly quickly. Having to type the same change reason in repeatedly gets annoying quickly. That is, @dr_michaelmarks' "...but the WHY".

1 Like

I agree with @aurdipas, that user information ideally comes from the credentials used to retrieve the form, and not from a user-entered field (and OpenClinica maintains sessions to add user info automatically).

I'm fine with option 2 as well.

FYI, to share how Reasons For Change (RFC) are done in OpenClinica. First of all there are special views that require RFCs and others that don't have them at all. I believe it's related to the role of the user and the stage of the review process. This is probably easier to do with webforms than with a mobile app. For the views that require this, we automatically add an input field at the bottom of the page for each field the user changes. These fields are separate from the form. The fields have to be filled in before page-flipping or submission is allowed. There is an option to fill in all of the fields together with one reason or add reasons for individual questions.

Could you post example of how these look in an XForm (XML?) definition? Or is this tagging accomplished entirely outside the form definition?

I'm thinking

<bind ... jr:rfc-prompt="what is your favorite color" jr:rfc-required='true()'... /bind>

to trigger popup when filling in form... Or perhaps better in the control definition itself?

The control for a 'comment' (called discrepancy note in OpenClinica) is defined as in the XLSForm/XForm as I posted above (so just the for attribute and an appearance). The RFC functionality is built on top of that discrepancy note question and has no associated XForm syntax (it's defined by the view that the backend UI launches for that user) but it shares the data structure (stringified JSON for OpenClinica) of the discrepancy note and is shown in its history (which the user can view within the form). RFC is always required for them.

Not sure how helpful that all is though, because that was designed long before any audit functionality was added to the spec, and I'm not advocating for it. The only advantage of their approach is that you can query that comment data (e.g. it has different statuses and they have a custom comment-status() XPath function that can inspect the status of that JSON data). This means they can use it inside constraint and required expressions in XPath. E.g. a question can be required only if it doesn't have a comment or a constraint can include the clause that a value can exceed a limit if it has an 'updated' or 'new' comment.

Though the use by OpenClinica may be useful to see how far clinical trial requirements could go (I think it's way beyond what a generic client such as Enketo or ODK Collect should handle, for sure). Hence none of that specialized stuff has made into the core Enketo. We just made the core Enketo very extensible to facilitate such domain-specific customizations.

1 Like

As I understand it, the workflow that @dr_michaelmarks and @chrissyhroberts have described is entirely offline and so there is no entity to authenticate against. The idea would be that someone fills out data on their device and then either reviews it and makes edits later or hands it to someone else for review and editing. It's the same as the paper case where initials have to be relied on. This would be a bit better because you could at least know exactly whose device submitted the data.

In an ideal world, I think Enketo could be used for online edits to extend that workflow. I'm imagining it could have a way to use server auth for this feature. For example, when a user is logged in to Central and launches a submission for edit, Central would pass on some kind of client token/hash that would automatically be used as the user identifier for this feature. I'd see that as a future extension on this spec -- something like if the session (virtual secondary instance) has a user identifier, use that.

You mean for all questions, right? My evolving sense of option 2 is that it would be something like a single audit attribute (e.g. odk:track-change-reasons) that lets the client know to prompt for user identifier and change reason every single time a field is changed from a blank value to a non-blank value. The audit log would get two new columns (e.g. editor-id and change-reason) that would get populated. Editing or saving of the form would be blocked until editor identification and change reason were populated.

Basically this.

That would be very convenient and can be client-specific.

1 Like

Will there be daycare services at the convening?... :wink:

Well, first time we could use the login information (username), then if someone edit has to enter his/her user before editing. (Of course no authentication, all offline as paper based).
A bit more of restriction can be added validating the user editing against a csv media file with the list of users "authorized" for editing (?).

Yes. And I like your approach :slight_smile:

1 Like

Great progress people!

It's all about enhancing electronic data collection by making is verifiable, have an effective audit trails thus credible/authentic data.

@yanokwa, see how to make a signature feature work. In my view, it could be the most tamper-proof.

Paul

Just a minor point but we per se don't need an actual "signature" - an alphanumeric identifier would be fine (that's what Redcap records for example) and the file size of saving a JPG, or in the case of an audit trail multiple JPGs each time a change is made, would be problematic on many places we use ODK.

1 Like

The second option seems like the best one to me as well but I'm not sure if I understand everything...
We have many comments here but no one approach (most of them come from the option 2)
.
We should ask for the user id every time a form is started//edited and write it to audit.csv file and that seems fine, but what about those comments... I saw the sample @LN attached https://docs.google.com/spreadsheets/d/1DqRHPKxrB1qZPp8rQtwky-Lm4WASTK33iVUQEKSJXtw/edit#gid=0

It assumes that a user should also add a comment just when the form is opened and it's a general comment. Does that make sense to ask for such a general comment before editing anything? I think a user might don't know what he is going to change at that point.
If we want to have a general comment for the entire form it should be displayed when the form is being exited.

Asking for a comment, for each edited question (if a question requires it) seems fine but in that case it should ask for a comment, only not user id becasue it doesn't make sense to duplicate it right?

I think we can implement it in two separate pr (two features):
First of all we should add an option to ask for the user id and a generall comment.
Then we can add an option to ask for comments for each separate questions.

Just to add (although we have incentive enough as is) that we are getting fairly serious interest from major players in other serious epidemic diseases to use ODK if we can get the GCP compliance issues sorted - of which this audit trail thing is the key step.

1 Like

FYI I think this would be inadequate; I can certainly see any change to a response - eg from "Fail" to "Pass" - as requiring both a timestamp and an optional (or mandatory?) comment. And in the general case potentially a signature too (@dr_michaelmarks?)

I think a fully general-purpose design may well need to fulfill:

  • allow for comment for every change (vs a single comment at beginning/end, or comment only for null-to-nonnull). And we probably want something to flag whether a comment is mandatory or not.
  • timestamp every change
  • allow for 'signature' (or otherwise initial) for every change (vs just a single signature at beginning). And likewise flag if mandatory or not.

Question is, we can we scope it back initially? eg only require signature at beginning (which might determine whether this is 1 PR or more...)

I think we can implement it in two separate pr (two features):
First of all we should add an option to ask for the user id and a generall comment.
Then we can add an option to ask for comments for each separate questions.

2 Likes

@LN and I spent some time iterating on a proposal built around Option 2 and we've published it at https://docs.google.com/document/d/1NyBFhASCOAqOSmz70diQObKpymVE59_MuwjOUPCKlR8.

We've put it in a Google Doc so you can easily leave comments. We'd love your comments or objections over the next 5 days. @TSC-1 please also review!

Some thoughts/straw man (google doc comments getting rather long...)

  • when first start up or reopen a saved form, the user has to pseudo-'login' by providing their name/initial in a popup (if odk:track-user set)

  • if odk:track-change-reason (per form or per question?) is set then any user change to an existing non-null value - including initially non-null defaults in the instance XML, or going back and changing a response - brings up mandatory popup to enter reason (this would have to occur after constraint checking). By implication, this change is tagged against the current 'logged in' user, since otherwise Collect has no notion of internal sessions. Note, this (also) occurs when first filling in the form [why? 'cause I dont see a compelling reason to introduce and track a distinct different state to distinguish between before and after saved form state... but perhaps that's open to debate]

  • best practice is that you must 'save' forms before handing the device to another person, who will then have to 'login' using their id/initial should they reopen the form and edit it. Realistically I dont think there's a lot we can do to enforce a re-login when you hand the device to someone else, so it'll just have to be best-practice...

  • any and all changes timestamped, with the initial answering of a question (ie null to non-null) suitably tagged as such, since these wont have a change reason (as described above)

1 Like

Thanks so much for all the thoughtful responses, everyone. @yanokwa caught me up on the TSC discussions.

The spec document has been edited to reflect the various feedback received. It's very similar to @Xiphware's strawman above.

You can reach the diff from the last version from File > Version History.

Some things to note:

  • asking for user ID and comments have been fully disaggregated as @Grzesiek2010, @Xiphware and others recommended. They are also completely independent from change tracking. This enables various scenarios such as auditing the identity of users navigating the form without including potentially sensitive form data in the log or using reasons for change as "notes to self" in a one-person data collection context.
  • the ODK XForms spec is deliberately agnostic to what the user identifier is and how it is obtained.