OpenRosa spec proposal: add matchExactly attribute to form list response

Yes I'd there is a good chance of that.

1 Like

I understand what @Xiphware suggests with something like a User-Agent key but that pushes the burden of maintaining several disparate client configs to Central.

i think my issue here is not so much burden as it is unpredictability and quirk fragmentation. the whole client config area of things feels like the sort of thing that will just end up being really messy and difficult for users to reason about unless somehow they can be really nailed down and standardized, which itself seems like a very difficult proposition.

otherwise the only reasonable approach is to just allow a bunch of arbitrary k/vs and force the user to read the documentation and leverage correctly for their particular client. i also vote yuck on that.

on the other hand, on the protocol side rather than the content side, the original thought by @xiphware:

If in reality this is the goal here - to perform the equivalent of a config'ing Collect with a QR code when they connect to a Central server - then perhaps that should be exactly what we do: an (new) simple API on Central to retrieve the QR code config data?

i think is a sensible option to offer; configure over web rather than by qr code if you'd like.

1 Like

I think this is what is making me most uncomfortable, and why I made the passing comment that "The need for a matchExactly implies that the original REST operation used to obtain the list of forms is inadequate in some way."... I dont actually believe there is any semantic ambiguity to OpenRosa's formList (or xFormList) API. Its well-defined as returning the definitive list of available forms, and (ignoring things like user roles, form status, etc) the definitive list of forms for which the server will accept submissions. Upon a client calling this API they know a priori exactly what they are getting; that is, this list is already both semantically authoritative and complete.

I do feel the real (and useful!) usecase of introducing something like matchExactly isnt so much for sematic rigorness, but rather to convey a client-side directive to purge everything from the client UI not in this list. This feels like a client-specific, and quite possibly session-specific, configuration setting and is not a semantic clarification necessary to correctly interpret the response data.

Controlling how a client should present form data is a very good usecase, but I feel this should be kept distinct from the more fundamental, open APIs around what data the client should present consume. Inter-mixing the two indisciminantly could lead to an increasingly less 'open' API that becomes highly customized to a specific client-server implementation. I would be far more receptive to remotely configuring client behavior - eg only display submittable forms, delete after submitting, generate PDF, etc - via a separate API or similar mechanism, and keep the actual data APIs minimalist, generic, and distinctly open.

i think i would argue the following things:

  1. my goal is not semantic rigor, but rather semantic expression.
  2. arguing that the present API is already authoritative or complete is to argue that the only semantically reasonable way to present the formList is as a menu of every possible thing the server would ever accept, and that this is well-suited to all implementations. from my conversations with users, it is not.
  3. i think the approach of using a separate configuration specification is dicey: i have no guarantee as a project implementer that a client is actually using the correct configuration relative to the particular formList that it is accessing. having the two resources be completely orthogonal to each other invites air gaps for deployment failure and we are back to where we started.
  4. i also think everything to do with a client configuration specification is more risky to the openness of the API because having attempted to do the homework on this i have found no remotely satisfactory way to describe such an API without mandating proprietary client behaviour across the ecosystem. i think if this is the approach you would like to push, i would like to see what you have in mind in concrete terms.
  5. wrapping back around to #1, i explicitly specified "MAY" in the proposal because again, i think even in the case of Collect there may be uncommon cases where whatever this flag is called ought to be ignored. it is up to the client to determine the client's behaviour in response to the information. relatedly i don't think in my head i would describe the goal here as "controlling how a client should present form data"; i think it is "describing the meaning of the formList given."

edit: i am just beginning to understand that perhaps your intent here is to say that a client configuration API should explicitly not be part of any standard openrosa specification, but rather in our case for example a proprietary Collect/Central API that we work out internally. i guess i would have no problem with this because it would allow us to move forward, but i feel like it's counter to the spirit of the open API?

For general reference (and not arguing one way or the other), here's the relevant formlist spec:

Form List API

This standard specifies how clients discover a list of available blank forms on a server.

Discovery Request

The discovery request should be sent in compliance with the HTTP 1.1 protocol.

If a server will filter the set of forms based upon the user's identity, then the server should require that the user be authenticated through either the Authentication API or through an alternative authentication mechanism. The server can then make use of the user's authenticated identity through those mechanisms to filter the set of forms to be returned.

The device will make a discovery request to a configured URI with a single query parameter, the deviceID. The deviceID should be the same id as provided by the default population mechanism defined in the Metadata Scheme. The server may filter the set of forms returned using this information.

Together, the authentication and deviceID enable a server to tailor the set of forms to both the user and the device (and therefore the device's capabilities).

Query Parameters

Optional query parameters MAY also be supplied:

  • formID If specified, the server MUST return information for only this formID.
  • verbose If specified with the value true, the server MAY include a or element providing a longer description of an XForm.
  • listAllVersions If specified, provides a listing of all hosted versions of each form (including the element) in the response document (see below).

...

I've highlighted what I think may be of interest to this discussion.

Is the concern that the formList returned is still excessive (even after filtering on user and/or device) and perhaps needs to be filtered on additional dimensions?

no; the concern is described here.

central already provides the exactly list of forms that the device is intended to carry. the issue presently is that because the only mechanism for describing "a list of forms" only allows the expression of "available blank forms to choose from" as you highlight, there is no way for it to declare anything other than "here are some forms maybe."

the goal here is to provide a way for a server to say, "here are the exact forms this server's administrator expects to exist on the device." whether the device honors that information or not is up to the client software.

1 Like

@Xiphware, I'd love your thoughts on @issa's latest response.

I see what you're getting at. I think that with the current ecosystem members, a small, optional addition to an existing API is much more likely to get adopted than a whole separate API. I think we do want to design that API for broader configuration in the near term but it's likely that it will remain fairly niche. In that sense, the former feels more 'open'. So basically I agree with @issa's point 4 above and would be interested in a concrete example of how a settings API could be more open.

I think adding an optional flag on the form list as a quasi-directive to tell the client - specifically ODK Collect - to flush anything not in the list is a relatively lightweight and unobtrusive means to solve an immediate (and significant!) pain-point for ODK tool-stack users.

The only readily obvious alternatives would be trying to come up with a whole new spec around remote client configuration (a lovely concept but... OMG, where to start!?!), or perhaps adding a brand new config option to the base ODK Collect build that, by default (ie unless explicitly disabled by the user), performs this flush automatically whenever you refresh a form list. But anything that auto-deletes stuff is extremely dangerous and has to be VERY carefully thought thru (whist survey deployments continue to suffer having enumerators inadvertently filling in out-of-date/deprecated forms...)

So yup, this is OK. Pragmatism Rulez! :slight_smile:

There's probably a subset of common (but optional) client-side workfows that we could come up with that could be candidates for remote config. eg delete submissions upon successful upload, disable validation, treat all XPath references precisely as specified (:wink: )... yadda, yadda. But maybe that's something the respective client-owners could flesh out (over a beer) and we go from there? :slight_smile:

2 Likes

:beers: So maybe the next step is to draw TSC attention here and see if anyone else has other comments? Then perhaps it can briefly be discussed at the next meeting.

That's sounding pretty great! Let's see where we go with this and then schedule the international beer session.

Is there an existing github issue I can refer to? (Presumably ODK spec?). I can open a roadmap issue referencing it, and these associated forum discussions, and put on next TSC agenda to review. [I figure we might as well follow the process we’re supposed to be following... :wink: ]

Btw is matchExactly (or what-have-you) a Boolean attribute, ie can also be false?

No GitHub issue yet. It looks like at some point @yanokwa added https://github.com/opendatakit/roadmap/projects/1#card-28845254 which seems sufficient to me.

It might be good to have at least a note about pushing client configuration from servers in the proposed column.

I think the allowed values for the attribute would be true and false. I still have a slight preference for matchExactly but I do see an argument for authoritative or some other declarative name.

K. I can open a roadmap with above github issue, with suitable transmogrifications... But y'all got till next week to pick a name! :slight_smile:

This is splitting hairs, but right now there are zero attributes in the whole formList, so it feels a lot more idiomatic to add a subelement to <xforms-group>. Then it would look something like:

<xforms-group>
  <shouldMatchExactly>true</shouldMatchExactly>
</xforms-group>

I added "should", because it's a specification of expected behaviour, but I'm not totally convinced by it. Is there maybe a term from set theory that denotes that a set is complete and cannot be added to or subtracted from? Or maybe some kind of "pod" or "shrink-warp" metaphor?

Hmm. Interesting... might there be some potential to exploit the (unused by Collect?) <xforms-group> for this purpose, with an appropriate tag as @adam.butler suggests? There does appear to be some semantic overlap, specifically (quoted from the spec):

The <xforms-group/> tag provides information about a group of forms; a further enumeration of the forms within that group can be obtained through the of that group (which returns an document). Groups can be used to define sets of forms that a user may wish to download together (such as for clinical studies, for example).

[emphasis added]. This is not exactly the same as specifying the definitive group of forms, but perhaps it is close enough to leverage? Thoughts @ln? @issa?

My apologies for slacking off this past month... So it looks like there was already a Roadmap issue ostensibly covering the client-server form sync requirement (added by @yanokwa), so I've added a link to this thread to it. But neither are on the agenda for next the upcoming TSC call, and it looks like we have a full agenda covering post-convening goverance, funding, etc topics... Can this be pushed to following TSC call for review/approval, or has it become more immediately pressing from a development standpoint?

FYI plan is to review this feature proposal (and hopefully approve) in next TSC meeting, which will be Jan 8. Do you @ln or @issa want to call-in?

We want to get it right or at least to an acceptable compromise state so it can take the time it needs to.

Fantastic, I'll be there!

Thanks for digging further, @adam.butler and sorry I've taken so long to react. I share your instinct of wanting to match what's there already. @issa and I discussed various possibilities including something similar to this and I may actually be relaying arguments she made.

It's intended that there could be multiple xforms-group blocks and a mix of groups and loose forms. The groups give clients information to visually organize forms from the form list. Collect doesn't have support for xforms-group at the moment but with Central having more capacity for grouping forms, I think it's something we'd want to add. What we want to express with this new feature is that the whole package of groups and forms needs to be exactly matched by the client so unfortunately I do think that we need to use the xforms root node to add this modifier.

When it comes to element vs. attribute, there are no hard rules in XML so we could do either. In this case, all of the existing elements feel like data whereas this is a modifier. That's what leads me to think it "feels" more like an attribute. Additionally, parsers will generally happily ignore attributes so if we're adding this optionally, an attribute is safer. If we add an element, we run the risk that different clients of the form list might choke on an unrecognized element name.

Good points! I agree this feels a bit more like a 'modifier' than actual meaningful data ( :wink: ), which as you say are easier for a client/parser to basically scan over and ignore if unsupported.

1 Like

Thank you @TAB for staying a little longer to discuss this one today!

There continues to be a fair amount of discomfort around this. Some of the themes that came up echo earlier parts of the thread and some were new. Here is a summary:

  • It feels like adding a feature that's not quite right and is going to become obsolete as a client setting API and a more managed longitudinal data collection experience are introduced (perhaps with client multi-tenancy). Attempts at addressing this here and here.
  • Requiring an explicit user action to trigger the form update feels incomplete to some TSC members. It feels like if the project designer's intent is for client devices to have an exact form list, that should happen automatically.
  • Relatedly, it doesn't play well with the existing auto update features that exist. Enketo periodically polls to update existing forms. Collect can also do this if a client setting is enabled (Form management > Form update). It feels like there should be a companion setting to hide forms that are not part of the form list when that check happens.

Hopefully I have accurately captured the major concerns. As we agreed on, I will chew on those and consult with @issa (or of course feel free to answer directly). If anyone has alternate approaches to suggest, please do.

i hope i have made the underlying user problem clear here. i'll restate that from the research i have done, solving that problem will make a significant impact on users: actual hundreds or thousands of hours of frustration and error may be returned to people's lives.

i don't feel i understand the ecosystem or Collect well enough to recommend a specific path forward relative to the sensible concerns expressed here, and i leave it with faith in y'all's hands to find it.