This is sounding good to me!
I think it would be unfortunate to be limited by CSV, though. One of the big reasons the form spec is built around XML is because it can represent arbitrarily complex schemas and XPath makes querying those straightforward. To give a concrete example, let's say you're collecting information about patients and you ask questions about what allergies each has, when each allergy was developed, how severe it is, etc. In an XML instance generated by an ODK XForms client, you might have a varying number of repeated
allergy blocks. Provided an XML document with all patient information, you could do things like get all patients with severe allergies, all patients with a particular kind of allergies, all distinct allergens represented (I don't think the ODK XForms spec has support for this last one yet but it could/should), etc. With CSVs, you'd need to have multiple files, cap the number of allergies per patient, have a really wide table, or some other workaround. This gets at the core of why it's important to support XML external secondary instances and make them performant.
I think the schema of the entity/asset representation will possibly lead to some creative options for performance. For example, introducing a standard element name like
entityId that ties records about entities together means clients (Collect, Enketo) could do something like have a database table per entity with
entityId as the key. XML blobs could be stored for each form filled about the entity. This would make listing entities, linking to a specific one, and querying a specific one extremely fast. Queries across entities could then use a virtual instance built from pulling all the relevant XML blobs from the database. This could also make synchronization between server and client more efficient: the clients could request updates after a certain date and the server could provide just those blobs that have changed.
All that to say that if one of the big goals for the performance work that is starting back up in JavaRosa/Collect is to support longitudinal data collection, it might make sense to start getting more concrete about what that means!