Collect: keep history of changes to values in the form

What is the general goal of the feature?
To be able to track if changes have been made in fields during data collection
Realistically only once the user has moved on to the next question; i.e
I complete Age = 20
I move forward to Gender = Male
I go back to Age and change to 19
Some record is generated within my dataset of this change

What are some example use cases for this feature?
Data change audits are a requirement of some forms of medical research as part of Good Clinical Practice Compliance - if this was a paper form and I was amending the data I would be expected to initial and datestamp my change.

What can you contribute to making this feature a reality?
Guidance on end user requirement

For example one way to do this would be:

User completes the survey
Summary screen appears showing all entered data
Button: Confirm this data is accurate
Yes > Move on to the save & finalise form (or indeed do this automatically)
Data Needs amending >
Move on to a screen which asks the user to select which variables need amending
User amends these variables
Return to the Summary screen and repeat process until user confirms that all data is accurate

Output

  1. Main CSV just shows the final data whatever that is (i.e the original value or a final amended value)
  2. Separate CSV showing the previous values of any amended variables plus the date-time stamp they were changed

Information on GCP compliance requirements is here:

Much of it is about broader issues around server reliability, encryption etc mostly things that ODK could be demonstrated to achieved.

Specifically I think my proposal here aim to address the two main points with which ODK is not currently able to demonstrate compliance:
IT08.04 bp Available audit trail: The audit trail for any particular data item is visible and readable from the user interface for authorised users

IT08.05 bp Searchable audit trail: The audit trail is searchable and capable of
producing audit trail reports

Two other points which I think there are different approaches too but which are related are:
IT13.01 Requests for Amendment: Any requests must be in writing and retained,
and must include the justification for the change

IT13.02 Recording Amendments: Any changes made must be logged and the details noted

These two points could also be addressed by the user submitting a specific error report form - we have done this for projects - because this point is about requesting a change AFTER data submitted to the central server)

Could some (all?) of this somehow be accomplished via exploiting the existing log file? It should contain pretty much everything the user does (and more!), so with a bit of smart filtering you should be able to generate a pretty decent 'audit trail' for each of the controls, telling you everything that happened to it.

It would still probably require making the log more accessable for auditing purposes, and some (post?) processing to extract the necessary trail(s), but I would think the log should contain most/all the raw data needed.

Have you looked at what's available in the log? This describes how you can get at it: Getting ODK Collect logs

Interesting will look.
It would clearly (looking at the help section on logs) require a change to how you access that data and how it is outputted for this to be a viable strategy.

The answer here is that unless there is something I am missing this log file doesnt seem to have what is needed.

For example I created a form and used a complex text string for one of the values.
I then changed the entry to a different string.

In the logfile if I do CTRL+F and search for either string neither appears.
To me it seems like the log file records records system stuff but not the actual data which is what I need to be able to track changes to.

OK, so it looks like the Collect log file used to capture user input, but may not now or in the future. Instead, the recommendation is to use the audit log for such purposes [sorry I sent you on a wild goose chase and didnt point you here to begin with...]. The audit log doesn't currently capture user input, but that is probably a very legitimate feature to consider adding, specifically for exactly your sort of usecase.

I might suggest taking a look at the audit log stuff, and perhaps opening a feature request(s) against it to add whatever is still missing for you.

1 Like

Yes as I udnerstand it the Audit log only captures timestamps etc not the user input; but if it did that then that would be perfect.
Where would you like the feature requested logged - here or Github?

Probably start here, if only for the wider audience so we get as much feedback as possible from other potential exploiters of such an addition. Then the TSC can make a decision/prioritize accordingly if it looks like a good idea with broad appeal, and they can open a github feature request to do the necessary development work. End of day, the key folk that need to hear already hang out in both places, so I dont think you have to worry about this falling thru the cracks.

1 Like

@dr_michaelmarks If I recall correctly, doing this kind of logging was discussed when the audit log was being designed but we ultimately decided against it being a default feature for two reasons:

  • Including potentially sensitive data in another place seems like it could be harmful
  • There would be certain limitations to this (in line with the existing log limits). Most notably, events are on a per-screen basis so changes to values within a field list would not be tracked.

I think the first concern is mitigated by making it a configurable option on the log. Is the second acceptable for you in the short term?

@LN

  1. Yes I agree this would be a configurable option (?set within the XLSForm as part of including the audit line so that the form designer has control over this?)

1a) Could the log be encrypted? We do this with our clinical datasets now because of these issues and also GDPR regulations within the EU

  1. If I understand this correctly your point is that if say multiple fields are on the same screen
    Age:
    Height:
    Weight:

That if I enter age, move to the height field on the same screen and then go back and change Age that this would not be tracked.
But if I moved on to the next screen
Sex:
Marital Status:
And then went back to the Age/Height/Weight screen and amended the data this would be tracked?

I think that would be fine.

Below are actions I think should be tracked:
Minimum:
A)
User exits form either part way through (clicking save changes) or at the end of the form but does not mark form as finalised (save changes)
User reopens that saved form and goes back to any data entry field and amends it
This type of change should be tracked
(Note we use encryption so once they mark the form as finalised we block editing)

Ideal
B)
User is on a screen and enters the age
They move on to the next data entry screen
They then go back to the age screen (without exiting the form) and amend the age
Ideally this should also be logged

For comparison
I was just playing with REDcap which is broadly considered to be GCP compliant so I can see what kind of audit trail that maintains.
It maintains something equivalent to scenario A outlined above - that is if I complete a whole record on Redcap, mark it as saved, re-open and amend a value it can clearly show me that change.

@LN @yanokwa
@chrissyhroberts& I would be interested in getting a sense of a ballpark figure for doing either A) Minimum or B) Ideal implementations of this as above.
We have some potential money but I am bad at guaging the cost of this kind of thing

@LN @yanokwa
We have money in a grant which we would be interested in putting towards this.
Do yuo think we could get a cost estimate for the work?

@chrissyhroberts

2 Likes

So looks as if we need a new column in aduit.csv file, currently we have:
event, node, start, end
or
event, node, start, end, latitude, longitude, accuracy
if location tracking is enabled.
a new column could be named just answer

We would need to fill that column only in case of questions. Questions are interval events that mean we set start and end dates for them, so my approach would be to fill the new answer column once end date is set (it takes place when a user leaves the question - navigates to another one or opens the HierarchyView etc)

Hi all! Seeing that this thread is coming to life again, I just wanted to point out that the TSC is discussing this feature at https://github.com/opendatakit/roadmap/issues/30.

There are two ongoing topics, in different degrees of consideration:

  • Remember the answer to a question from the last saved submission, in order to pre-load the answer when loading a form for the first time.

    There seems to be a consensus about this one, although we're waiting on more opinions about it. This one will probably be the first one to be implemented.

  • Remember all the answers to all the questions ever answered for a form in a device, and offer some sort of autocompletion feature with them.

    This one needs more discussion and will be probably delayed after the previous feature gets shipped to get more user feedback.

Feel free to comment!

I think it's not the same @ggalmazor this is not about any auto-filling/preloading. The user who asks about the feature just needs the history of changes.

1 Like

Correct - this is an audit trail of changes not auto population.

1 Like

I understand. Thanks for the clarification! :slight_smile:

@Grzesiek2010, @LN and I have been iterating on this feature and we've made good progress! The key design decisions we've made so far are:

  • The feature can be enabled/disabled through changes to form design
    • Likely an attribute in the form called odk:audit-track-changes (or something similar) which can be set to true.
  • If enabled, we will add new event called value change (or something similar) will be added. That event will have a new column called value where we write the changed value. We will include event, node, timestamp, and location columns.
    • This approach makes for easier to analyze logs and we don't have to solve the "how do you represent NULL" problem that comes with using the question event to store this data. More here.
  • Every value that is changed (on swipe/next) will be written to log (even if in a field-list)
    • We will not log things a user can’t see (e.g., calculates) because those aren’t necessarily triggered on swipe. Calculates are constantly being re-evaluated and it would be a lot in the log.

We have an initial pull request to evaluate feasibility of the above and will update this topic as we make progress. One current unknown that I'm looking into is if/how we store extra information about the reason for the change.

1 Like

Only event, node and that new value columns should be filled right in that new event? I don't think it makes sense to record time since it wouldn't be the time of answering a question but the time of navigating.

2 Likes