ODK Central support for zipped media files?


(Jonathan Niles) #1

What is the general goal of the feature?
The XLSForm documentation suggests that large media files can be zipped to improve upload/download efficiency. This has been an incredibly useful feature for operating in a bandwidth-constrained environment. The current behavior of ODK Central is to ignore media file names that are not directly matched (e.g. 'sites.csv' will not match 'sites.zip' or 'sites.csv.zip'). However, Aggregate will accept these files and ODK Collect will automatically match the zip file with the original CSV filename in the XML form. It would be nice if ODK Central could implement this behavior.

What are some example use cases for this feature?
We routinely perform household surveys that gather data, then follow up several weeks/months later to understand how interventions have changed the environment. We use geocoordinates and IDs generated in the initial survey to make sure we are going back to the same locations, so we upload a portion of the original dataset as a CSV to pull data from during the follow-up. These CSVs can be in the tens of thousands of lines, and weight 30+MB unzipped. Zipping gets down to under 10MB.

What can you contribute to making this feature a reality?
I'll be a tester! I can also try to contribute code with guidance. It is probably an easier place to start than the publishers issue. :wink:

(Clint Tseng) #2

hey jonathan!

this is a great suggestion, though i would like to propose a change: i think we should just transparently zip everything behind the scenes using HTTP gzip compression, and the user will never have to worry about doing this manually, and everything is just zipped for you.

does that sound reasonable to you?

(Dr. Gareth S. Bestor) #3

do we want to be (unnecessarily?) compressing/uncompressing jpegs, unless of course explicitly user specified? Or can Central be smart and not auto-compress anything it knows wont benefit (eg jpegs, mpegs, etc)

(Clint Tseng) #4

it does seem like CSVs are the primary beneficiary here.

as for transmission downwards to Collect, it's all based on nginx config which has appropriate gates for minimum size and mime types. i do not believe the CSV types have been included so far, which i should rectify as a part of this.

(Jonathan Niles) #5

Hey Clint,

Sure! HTTP gzip compression should work fine, although we may want to put a note about it somewhere so that optimization-hungry surveyors don't come reporting the same issue. If we could put it in the text on the upload panel that would work well.


(Clint Tseng) #6

Progress: downloading CSV form attachments from Central will now gzip if the client accepts it.
https://github.com/opendatakit/central/pull/36 (currently slated for release with 0.4.0)