Specifying aggregate config options via XForm definition


(Dr. Gareth S. Bestor) #1

I'm not a particular fan of odk:length either. My preference would be that the client - Collect, Enketo, ... - should simply send the entire result [its not like XML have any restrictions on its length...] and its up to the backend - Aggregate, Central, ... - to make sure it saves the entire result, in whatever internal format (string, blob, what-have-you) is most appropriate to whatever database it happens to be configured to use (Postgresql, MySQL, ...).

<soapbox>As much as possible I think this internal default string size should be completely hidden from the user, including the form writer. If that means storing potentially arbitrarily long geotraces/geoshapes in a different database format than geopoint, then so be it.</soapbox> [or perhaps this is a more appropriate discussion to have over in Slack?]


Geotrace string truncated and area calculation
(Clint Tseng) #2

hm. does anybody know what the biggest number I can safely apply is, then? i am loathe to expose this concept to the user even as an optional/advanced setting.


(Dr. Gareth S. Bestor) #3

Is there an (implicit) assumption that both Central and Aggregate should behave (identically) with regards to presence or absence of odk:length in a form? [and that's a legitimate question, not a rhetorical one...] eg having the same form loaded into both, but having Aggregate silently truncate at 255 and the Central not. Or, in the case of the form specifying a (very) large fixed odk:length=X, having Aggregate not truncate (at Central's smaller internal hardcoded max < X) whereas Central will.

I think if odk:length is specified, it must be strictly obeyed, irregardless of any internal/hardcoded constant. That is, accept everything up to that limit, and discard anything thereafter! Where there is probably some "implementation dependent" flexibility is only when odk:length is not specified. [I dont like it either, but now that its in the spec, I think `odk:length` behavior probably needs to be deterministic]


(Clint Tseng) #4

oh—well, central preserves whatever XML collect sends as-is, no matter what. so if odk:length is something enforced by aggregate then central fails to perform this restriction entirely.


(Dr. Gareth S. Bestor) #5

Hmm. Thoughts @martijnr? When specified, if/how odk:length is enforced might need to be spelled out spec-wise (again, my preference would be that odk:length be something unnecessary to expose in a form definition, but the horse has left the barn...)


(Clint Tseng) #6

i am just generally surprised to learn that it is aggregate enforcing this restriction. in my mind, especially given that odk is highly componentized and that xforms are the lingua franca, it feels incorrect that Collect would not perform the truncation itself. that means that Collect outputs “unfinished” xform submission data that all servers consuming collect submissions have to polish up.


(Dr. Gareth S. Bestor) #7

I'm not sure I'd describe it quite like that, although the consequence is effectively the same. Rather, my impression/interpretation is that Aggregate needs a 'hint' - when setting up its internal DB table to store submissions for that form - that certain string fields need to be declared with a much larger CHAR than the default 255. And if they're not, and it receives a control result in a submission (form Collect) that is much bigger, that Aggregate will, out of necessity, simply truncate it. And that hint is odk-length.

In that respect odk-length appears a bit different than, say, max-pixels, which is definitely more of an explicit directive to the client (Collect) to perform 'truncation' before submitting results [I dont even know if Aggregate will refuse to accept a larger image... it may well be un-enforced by the backend! Does Central enforce it?]


(Suzanne Dircks) #8

Hi All,

Thanks for your replies.

Indeed now that I ran into this problem I also discover the warnings.
I have also found that several colleagues using geotrace were unaware of this truncation problem,
so the warnings should definitely be clearer!!

I would support that using the geotrace tool would automatically change the string length constraint to infinite (but still editable). By default if only saved 4 whole coordinate sets, that is too little!

In the new survey version I tried to set the string length to 15666 (max = 16000) in an added bind::odk:length column.
The output is still truncated if I obtain it via the Aggregate server!
I tried multiple ways according to a bunch of instructions on this forum, to no avail.

I am now retrieving the data directly from the tablet via Briefcase, to access the untruncated strings.

Bottom line:
I can still access the full strings if I avoid the server, but this is a major inconvenience!


(Suzanne Dircks) #9

Indeed, the area() function is nice, but I would like to see the actual calculation if possible.


(Dr. Gareth S. Bestor) #10

This is a thread from back when area() was being implemented, which alludes to the algorithm employed. @martijnr may know the actual specifics, or you could poke around the javaRosa codebase and decipher the actual calculation.


(Clint Tseng) #11

given this description, i would interpret odk:length as an aggregate-specific customization, and not an odk xforms specification, and in this case i believe central is correct to ignore the setting entirely. the question remains of what to do with build.


(Martijn van de Rijdt) #12

Yes, I agree. Enketo ignores it too, and has no (known) string-length restriction.


(Dr. Gareth S. Bestor) #13

<rhetorical>
So.... what do we tell form writers, who may not know (or care?) where the data is going?
</rhetorical>

Taking a step back, a form writer uses control appearances to, in effect, convey 'hints' or (ostensibly optional) customizations to the XForms client on how he wants the client (Collect, Enketo, ...) to process the form. Do we, perhaps, also need to a define a more formal mechanism for how she can convey hints/customization to the XForms server? We essentially seem to be doing just with with odk:length; as @clint suggests, its aggregate (server) specific and can (or should?) be ignored by the client (!). But if this is in fact the effective usecase being solved here - namely server customizations - then I'm not convinced doing these via an XML element parameter (quite probably on every binding of that datatype! eg geotrace) is necessarily the best approach; no offense to odk:length, but it feels like a bit of an interim hack just to solve a very specific issue that cropped up, rather than a generalizable approach to conveying server settings via an XForm definition.

Hmm, somewhat related... so would orx:max-pixels be considered a server setting/customization, or a client one? Clearly its telling the client to do something, but is it also conveying to the server what to accept (b/c one of the primary purposes of max-pixels it to prevent/limit storing excessively large images on the server)

Thoughts?


(Clint Tseng) #14

i think i've made my stance on control appearances relatively apparent (:slight_smile:) over the years, but perhaps not in this venue, but i am not a huge fan of the way it's been done. i am of the opinion that given any specification value, either it matters or it does not. if i specify that a question is likert, i would not consider this a suggestion or a preference or an optional hint. if we are to provide a likert option, it ought to be a fully specified, fully defined feature that all fully compliant tools support completely, with no guesswork involved, and with a properly defined fallback should the UI application not support such a feature for any reason.

likewise, in this case i would suggest that aggregate has decided to make an internal problem of its own (what size of database column to initialize?) the worry of the form author and the ecosystem at large. as you suggest, a cleaner solution would be to make this an aggregate-specific setting rather than a part of the xform definition (i recognize that this has detrimental effects on the independent portability of the xform definition). i would also gently suggest that a text column type is probably superior to any fixed-length varchar-type in this scenario, as the values are 1 likely to be long, 2 unlikely to repeat, and 3 unlikely to be directly queried by any reasonable index.


(Dr. Gareth S. Bestor) #15

[Good discussion, just moving to a dev topic since its getting a bit obscure for Support... :slight_smile: ]


(Clint Tseng) #16

yes, good move, thank you.


(Dr. Gareth S. Bestor) #17

To play devil's advocate somewhat... you could make the argument that a select1 with appearance="likert" is doing just this; that is, the stated (although strictly optional) likert appearance is 'fully specifying' the specific widget which form writer wants, and the control type select1 is effectively defining the 'fallback' (!)

That is, a control's fundamental base type (eg string, integer, geopoint) - which all arguably should be mutually exclusive, but arent... - has an associated default widget that minimally must be supported by any client, and which will be used as your fallback should any specified sub-type (aka appearance) in the XForm isnt supported, or in absence of any specified sub-type.


(Clint Tseng) #18

theoretically, the control's base type is specified via its binding, not its input. the input itself should define the display type. whatever appearance is, it is heavily overloaded. sorry, in a rush, response short.