1. What is the general goal of the feature?
Thanks to recent ODK Briefcase releases we have three targeted pull/export features
start-from-date to resume pull operation from a specific date
start-from-last to resume pull operation, picking up from position of last pull.
export-from-date which runs an export operation from a specific date
What's now logically missing is the export counterpart to
export-from-last which runs an export operation, picking ip from position of last export
I can also see reason to include an option
smart-export which looks at the target file, then either
- creates the target file if it is not there and exports from first submission
- Identifies the
meta-instanceIDof the last exported submission and then exports from the next submission
2. What are some example use cases for this feature?
When working with long lived and heavily used forms, export time becomes a significant issue when using behaviour that deletes the old csv and exports all data to a new one.
Daily export of ~750,000 forms to 17 CSVs currently takes about 5-7 hours on our largest project
An alternative solution we have tried is to use the
append behaviour and to stitch new submissions from an arbitrarily recent date (i.e. today < 5 days) to the end of the existing form, but this leaves duplicates that need to be removed using downstream analysis. Basic Unix based process such as
sort | uniq leads to problems with line order being changed in the resulting file (headers are also affected) so is not ideal.
Using export from date only works at granular level of the day, so if we pulled twice a day we would potentially miss or duplicate some records.
export from last would have a fairly good use case for all export activities, but especially in long lived forms when managed on CLI
When system failure occurs or when passing system over to another operator, the ODK briefcase database can be quite quickly recreated by copying ODK xmls from backup drive to a new machine, but the ````start-from-last
orexport-from-last``` position flag is lost when this happens. By implementing a system that can look at the target file and identify the appropriate resume point, the full system could be rapidly recreated/replicated from system failure by
- Copying the xmls folder to new machine
- Copying the target CSV folder to the new machine
Another use case for
smart-export is that I set up a system and run it for six months. Then I need to go on a journey and @dr_michaelmarks wants to run the system while I am away. I copy the whole ODK Briefcase directory and CSV files on to a hard drive and give them to Michael. Michael's copy of Briefcase then just figures everything out (resume points for pull and export) and picks up where I left off without having to first run long pulls or exports.
I think that resume data are currently stored in system java somewhere, but moving to within ODK briefcase folder would allow resume info to be carried within the folder, meaning they could be passed between two systems.
3. What can you contribute to making this feature a reality?
Beta testing, discussion