Want to download only new records

odk-briefcase

(Frances James) #1

Hi everyone,

My team and I have developed an SQL-Lite database that combines the two .csv files exported by ODK Briefcase (1.9.0 Production) into a format which can be analysed easily by fisheries officers.

However, the database is becoming slower and slower as it erases and uploads the whole dataset every time.

I have just noticed that under the 'Export' tab on Briefcase you can select a specific date range. I am hoping this will solve my problem? Or is there a much easier way of just exporting new data from the server?

Many thanks,

Frances


(Yaw Anokwa) #2

Hi @Frances! I wanted to tease out a couple of things here...

If you do a pull, Briefcase will always scan the entire Aggregate database to see if a submission on the server exists on the local machine. If it doesn't then it'll download the submission. On Briefcase v1.7 and greater, both the existence check and the download (if parallel downloads is enabled) happens extremely quickly.

If you do an export, Briefcase doesn't talk to the server at all. It takes the data on your local machine and exports that. And yes, as you noted, you can export a particular date range.

Some optimizations:

  • If you are working in the GUI, upgrade to Briefcase v1.10 (found at https://github.com/opendatakit/briefcase/releases), and you can pull and export in one operation.
  • If you switch to the CLI, then you can write a script that does the pull and exports a date range that you can have the script configure (e.g., a Bash script that figures out yesterday's date and makes that the start_date).

I don't know how you are generating your SQLite DB, but if you are comfortable in Python, you can write a script that uses pandas that takes the exported CSVs and does whatever merging you need and output to SQLite. If you find that your SQLite DB is being slow, then run VACUUM! https://blogs.gnome.org/jnelson/2015/01/06/sqlite-vacuum-and-auto_vacuum/ has a quick explanation.