Collect large form performance

Really interesting how differently different kinds of forms perform! I think there's a pretty good range of forms there so I don't see an immediate need to time more.

The most immediate follow-up question I have is what the time difference is for the eIMCI form on a real, underpowered device like a Samsung Galaxy Young.

Do you have any sense of what determines whether a form will benefit from caching? Looks like secondary instances really don't work.

Yaw brings up a good question in Slack: What is the performance on an actual Android device? (I ran this on a fast MacBook Pro with a solid state drive.) I think I’d better build something to produce this data for all the forms on an Android device before doing much more analysis.

I added nonblank lines in the XML form, and as you can see, there’s no correlation between that and the performance improvement from caching.

improvement and lines

Results from my Nexus 7.

Here’s a branch I’ll keep around for awhile if anybody wants to get these measurements on your own device.
https://github.com/dcbriccetti/collect/commit/868ebec9ecfaec71a6a4617a096c8f203bd99ff8

1 Like

Any ideas (even rough) on what makes a form be helped by this caching strategy? Off the top of my head I would say it looks like if the primary instance is long it helps.

Well, it’s not lines and it’s not “deep child count”. I don’t know what it is, but can keep digging.

Nice tool. :+1:

For those interested in helping with this investigation, the tool runs timing tests on all forms present on the device. It logs times in ms -- the output is the first 6 columns in the screenshot above. I used adb logcat -s "TimeParseAndCache:I" to see just the log output from that class.

I tried this with a Galaxy Tab A and got very similar results. I would be interested in seeing how much longer eIMCI takes on an underpowered phone. @mmarciniak90 @kkrawczyk123 @Grzesiek2010 any chance one of you could try this?

Clearly it makes no difference for small forms. A 6 second difference for eIMCI worries me, though. I tried this with a large confidential form with 1877 lines and 147 children and got a 2 second difference:
1877 | 147 | 2.411 | 3109.070 | 1264.679 | 1142.609

1 Like

Here’s the spreadsheet. Let me know if you need it in Excel format.

Comparison of Reading and Parsing XML to Deserializing from Cache.numbers.txt (644.0 KB)

Headings

(I should have added these to the modified Collect app logging output.)
image

New hypothesis that should be verifiable just by looking carefully at what is cached (the DAG): forms with a lot of dependent fields (calculates, relevants, etc that use other nodes) see a big speed up when cached. @michal_dudzinski this is starting to overlap with what you’re working on so maybe you have ideas.

1 Like

Sure we will try it.

Samsung Galaxy S4 GT-I9506 - Android 5.0.1
galaxys4.csv (664 Bytes)
Samsung Galaxy Young GT-S6310 - Android 4.1.2
galaxyYoung.csv (665 Bytes)

2 Likes

Thanks! Here are your results, pasted into the spreadsheet.

PrefixTree.addString

Here you can see that Collect calls JavaRosa resulting in calls to PrefixTree.addString:

I added some timing logging to that function, and graphed the results from loading a much shortened version of the Nigeria Wards form. This looks to me like O(n) time. I think we can improve the algorithm, but perhaps we can stop using PrefixTree altogether. It saves RAM by condensing many strings into a tree structure, where common parts are shared.

eIMCI

Here are measurements for eIMCI:


(Units are milliseconds)

PrefixTree Performance and Memory Impact

I’ve done some measuring, on my Nexus 7. Using the PrefixTree “string packing” on large forms with many strings is costly, as you can see here:

The getLocalData method, which creates and loads the PrefixTree, takes 20 times as long when the packing is enabled for eIMCI, and 706 times as long for Nigeria Wards.

Not packing the strings requires more space. According to the Android Studio Profiler, the “shallow size” of a PrefixTreeNode is about 32 bytes. Add in a List of references to children, and the character array size, and you have the size of the node. A rough estimate for the not-packed Nigeria Wards PrefixTree, with its root node, and all descendants being immediate children, is 2 MB.

Can we afford the RAM, and turn off the string packing?

Parsing Time, Broken Down

Some raw log data. I’m gonna go after the “Creating FormDef” 11.5 seconds now.

FormLoaderTask: Attempting to load eIMCI by D Tree.xml from cached file: /storage/emulated/0/odk/.cache/93d49a83bdfe2b842c6b23d0cc9e3cc0.formdef.
FormLoaderTask: Loaded in 6.020 seconds.
FormLoaderTask: Initializing form.
System.out: Nodes: 457, total string length: 11065
System.out: getLocaleData finished in 743.805 ms
FormLoaderTask: Form initialized in 3.104 seconds.
FormEntryActivity: calling formController.setLanguage
FormEntryActivity: Ended up with a bad language. null
System.out: PrefixTree 129383576 clearing
System.out: Nodes: 457, total string length: 11065
System.out: getLocaleData finished in 662.262 ms

FormLoaderTask: Attempting to load from: /storage/emulated/0/odk/forms/eIMCI by D Tree.xml
System.out: Parsing form...
System.out: Reading XML and parsing with kXML2 finished in 533.783 ms
System.out: Consolidating text finished in 455.505 ms
System.out: Title: "eIMCI by D-Tree"
System.out: Nodes: 798, total string length: 20473
System.out: getLocaleData finished in 1108.704 ms
System.out: Creating FormDef from parsed XML finished in 11,539.581 ms
FormLoaderTask: Loaded in 12.579 seconds. Now saving to cache.
FormLoaderTask: Saved in 11.475 seconds.
FormLoaderTask: Initializing form.
System.out: Nodes: 457, total string length: 11065
System.out: getLocaleData finished in 674.957 ms
FormLoaderTask: Form initialized in 3.211 seconds.
FormEntryActivity: calling formController.setLanguage
FormEntryActivity: Ended up with a bad language. null
System.out: PrefixTree 130682409 clearing
System.out: Nodes: 457, total string length: 11065
System.out: getLocaleData finished in 671.600 ms
FormEntryActivity: Done in 0.673 seconds.

Here is a pull request removing PrefixTree, along with some preliminary measurements of the effects.

@dcbriccetti Am I understanding correctly that on a Nexus 7 removing string prefixing gives a 706x speedup for the Nigeria Wards form with only 8-10mb of RAM (~12%) more? It would then load in less than a second? If that's correct, that's really significant!

The times shown at Collect large form performance - #22 by dcbriccetti are the time spent in the getLocalData method (and its callees), so it’s not quite as good as that, but still is good.

PrefixTree Removed Jan 29, 2018

So, you can ignore everything about it above, unless you have historical interest.

Update on Form Processing Metrics

Here are the latest measurements:

@johnthebeloved has returned to some of these explorations recently. After extensive analysis, he concluded that caching is never beneficial for internal secondary instances on modern devices. In javarosa#445, he has turned off serialization of internal secondary instances and ensured that these are always read directly from XML, even when the form definition is loaded from the form cache. On a Galaxy Tab A, that means about a 6x reduction in load time for the Nigeria Wards form from ~45 seconds to ~7 seconds.

We've discussed various hypotheses about why serializing and deserializing the contents of internal secondary instances is significantly slower than reading them from XML. The major theme is that modern Android devices seem to be much more IO-bound than CPU- or memory-bound. Secondary instances tend to have very simple structures because they're generally used as itemsets in selects or to look up individual values. Serializing them requires writing out a lot of redundant information in each TreeElement and that IO is very expensive (as @dcbriccetti found when switching from computer to device). Reading from XML is a lot less IO intensive. This is a good reminder that code put in just to improve performance should be reviewed periodically as device characteristics evolve because something that used to help could suddenly start slowing down the common case.

All this to say that as of JavaRosa v2.15 / Collect v1.23, the form processing metrics table above should look quite different for forms with large non-main instances because cache write and cache read times will be much lower. Thanks, @dcbriccetti for all the groundwork and @johnthebeloved for the latest improvements!

2 Likes