Suggestions for odk:length parameter

#1

@LN this seems like it could be(come) a common problem, and at some level it doesn’t seem to make sense to force users to have to override the default string length for every occurance of a specific data type in their form (eg geotrace/geoshape). Which might suggest this could be a good candidate for more of a global override (eg a Collect runtime config setting?) which would allow users to bump up the string length past the default, say on a per data type?

Increasing string length beyond 255 for long geotraces
#2

From Increasing Aggregate Field Length:

By default, Aggregate's datastore layer limits text values to 255 characters or less. If a submission includes a value longer than 255 characters, those additional characters are not saved in the database and no warning is shown. That means there is a risk of data loss when using question types that save long text values such as geotrace, geoshape or select multiple. This limitation exists for performance reasons, particularly for older versions of MySQL.

[emphasis added]

Hmm. I see two issues:

  • First, (rhetorical question follows...) why are we specifying a text (ie "string") constraint in a binding for a completely different data type (geoshape)?! Yes, I know the answer - because under the covers we store geoshapes as strings, but hopefully you get my point: we are, in effect, making explicit in the XForm definition itself a backend implementation detail. Not only that, but its really to just override a default setting in one of the legacy databases that Aggregate can or used to use, and then only for performance reasons.

  • Second, by introducing a constraint like this in the form definition we are arguably now burdening both frontends (Collect, Enketo, ...) and backends (Aggregate, Central, Kobo, ...) with potentially having actually enforce this as an additional constraint on the property value. Although its really no more than just a hint to the backend should it want to store the property in a database, as it stands defined in the XForm I dont see how an XForm client can not treat this as a real constraint. If we have a backend (eg Collect+Postgres) and a front-end (eg Enketo), neither of whom really care how long strings grow to, its unclear what the formal prescribed behavior should be when faced with a "odk:length" setting...

I guess I've always struggled a bit in how best to set, and appropriately convey, configuration settings [if you recall I had issues with max-pixels too... :wink: ]. In this case, it doesn't quite seem right that a configuration setting necessary to override a default for a particular backend using a particular database underneath must be exposed in the form definition, especially when it is highly desirable to be able to share forms across different deployments and implementations. I certainly understand the reasoning behind odk:length, but its presence in the XForm definition, in a binding that ostensibly has nothing to do with strings, makes me a bit 'uncomfortable' for some reason... :thinking:

I would appreciate your @LN and others thoughts around this, namely how best to handle configuration settings in ODK? should they be inside Xforms or some other config file? per control binding vs per data type?... [feel free to split this off as a separate topic if that's more appropriate]. Especially now that we have the likes of Central, which is only going to further decouple the client (and form) from the specific server used.

#3

I see odk:length as an unfortunate side effect of Aggregate supporting several different datastores that have very different performance characteristics and that can be deployed in wildly different environments.

I don't like it but I don't see any other good option. The setting is not a client-side setting, it's a server-side one. The other option would be for the user to have to set the length for each database column on table creation but that would be a lot less friendly. It does have to be on a field-by-field basis because increasing the size of all string columns could have significant performance impact depending on the setup. There are certainly some refinements that could be made and if someone has a concrete proposal and the cycles to implement it, that would be great!

I think it depends a lot on the particular setting. max-pixels , for example, is a client-side transformation that is useful to be able to specify both at the client level and the individual field level. There are situations where some images have to be full resolution and others can be shrunk. The person who is going to do the downstream analysis is best positioned to make those decisions and encode them in the form.

#5

Coming back here from Slack since this is more of a long-running conversation that is worth archiving than a quick set of development decisions.

The reason max-pixels was introduced and the only reason I've used it is to limit client uploads. That is, the form designer may know that certain images don't need to be full resolution and by encoding that in the form, he or she can reduce bandwidth needs across a deployment. Once the images have reached the server, the hard work we were trying to avoid has already happened so I don't see an advantage to rejecting the submission. I can't think of a situation where a larger-than-expected image on the server would cause problems (that at least can't easily be recovered from). Can you?

odk:length, on the other hand, is critical at Aggregate database creation time and generally ignored by clients and any other server. Certainly, there's some danger in that because there could be truncation. But the database administrator may tweak database parameters over time as more information is gained about the deployment so trying to enforce it client-side is also problematic. As much as I don't like it, it hasn't affected many deployments that I'm aware of. That said, if you, @Xiphware, or someone else would like to make a formal proposal to improve the way that Aggregate database size issues are handled, that would be great! It could certainly be considered by the TSC.

split this topic #6

4 posts were split to a new topic: Increasing aggregate field length