Integration of Enketo into ODK Central

Thanks @martijnr.
Such a lot of perspectives ! :smile:

1 Like

Hi everyone (and @issa in particular :smile:),

I work on KoBoToolbox (one of the partner organizations that @martijnr mentioned) and would like to get some of this done before the end of the month(!) Having read through this thread, it seems like Editing already submitted data probably needs further discussion and won't come together so quickly—but some of what's under the Viewing/pre-viewing/pdf-ing forms and Collecting data in a browser headings could probably be done before September.

Here's how I thought I'd sequence the work:

  1. Offline-capable, multi-submission surveys done by enumerators
  2. A form preview to test form appearance and logic, as used on https://opendatakit.org/xlsform/

…and, if there's still more time,

  1. Viewing a record in readonly form
  2. A PDF of an empty form
  3. A PDF of a record loaded into a form
  4. Online website-embedded surveys
  5. email single-use surveys that have some protection against multiple submissions by the same person

:star: What would be the best way to get started? There are a few design (UI and code) decisions to make; I could start another topic here or bring them up on the Slack.

Thank you!

5 Likes

hey @jnm, et al:

really happy to see some movement on this. :slight_smile:

i think before we proceed too far we need to work out a couple of things:

  1. what is the proposed method for shipping and operating enketo in our environment? is it optional or always present? how do we manage upgrades?
  2. what is the permissions and authentication methodology between the systems, both at the service level (central can trust this enketo because xyz is true) and at the user level (this instance can be submitted because xyz is true).

i think for these reasons and more, it probably makes the most sense to focus on read-only features first. previewing a form, both interactively and as pdf, and viewing existing single submissions. i think for enketo to actually submit anything we have to really sit down and puzzle through how these features are likely to be used and therefore what sort of permissions/user tracking models make the most sense, and i think those conversations will take some time.

i will warn you that i will be a little bit of a pain in your butt on user experience and on security. i do not hold absolute power over this project but that doesn't mean i can't be loud and annoying. :slight_smile: i would really love to see proposed design and criteria (see our own release criteria for v0.6 for an example of our process on this) before work proceeds. some sort of similar proposal process for the technical architecture will help this process a lot.

finally, i will suggest one direction for solving some of these problems: we can enable backchannel privileged communication between services within the docker-compose cluster, and central/enketo can mutually trust any requests made over that communication without authentication. this can be done by exposing a secondary API over a private port that isn't exposed outside the docker network. the asterisk here is that for the purposes of the server audit log (coming in v0.6) it will still be necessary to identify the actor invoking each action.

it may also make sense (or not) to avoid exposing enketo entirely and proxy everything through central.

4 Likes

Not that we need to wait till then, but this should make for a great discussion at the convening!

Right! But my concern would be that this would be too late to be able to use the funding we have now (knowing how busy @jnm gets ;)). Hopefully we can get this fleshed out beforehand.

Do you have a preference for a technical solution @issa? I do not (but also am not familiar with docker). Whatever is best for users to use, install, and update Enketo. It might be useful for scalability if the solution facilitates hosting Central and Enketo on different servers, but maybe that is too complex (for the target user group). I know an important consideration for Central is the cost of running it. It is not really recommended to run Enketo on a droplet with less than 2 Gb of memory ($10), so maybe that means it is best to make Enketo optional.

A. For Enketo to allow only API calls from the trusted Central/KoBo/Ona/Aggregate server, there are 2 levels:

  1. required: an API key provided in the Authorization header with type Basic as the username part (password part is empty)
  2. optional: the Enketo server can be set up to only allow a specific form server domain or domain + path (e.g. only accept the server_url API request parameter value 'https://opendatakit.appspot.com').

These API calls return a webform URL.

B. Submissions to, and obtaining form resources from Aggregate/KoBo/Ona/Central:

Enketo supports 3 methods: https://enketo.org/develop/auth/. I saw a while back that Central supports OpenRosa authentication with a small change (no handshake, if I remember correctly). That would be a minor change we could support in Enketo if (really) necessary. It would be great to use one of these 3 supported methods. KoBo uses the first method. Ona uses the second method.

1 Like

is there additional documentation on how the external authentication processes work? how does enketo know where the login page ought to be, what does it expect back, etc?

if i'm reading this flowchart correctly, can some of these redirection steps be bypassed if a token is already provided with the initial request?
central has a relatively robust temporary token system that can grant very fine-grained/limited permissions for time-limited tokens, so that could be used here; albeit it sucks to have to do database writes to perform cross-service read operations.

in general, i would be pretty disappointed if we can't work out some kind of relatively seamless/single-sign-on experience between central/enketo, where you never have to understand that you are authenticating to two separate services.
it seems to me that if we can't achieve a fairly tight amount of integration for the eventual end user, we may as well just make sure the services can play well near each other and provide documentation on how to install the two completely separately and then use them together, rather than put a lot of time into interoperation and co-installation only to have the end experience not reflect that investment.

(yes, i entirely ignored the installation questions for that response. i haven't had any good ideas yet.)

It's a configuration item. See here. Sorry, we are still working on creating more comprehensive documentation for Enketo (same project that funds this work).

I think you're misunderstanding the authentication requirements for the data collector that gets told to use a particular Enketo webform URL to start collecting data. Enketo has no authentication requirements itself for that scenario (and it stores no data). Form access and submission access are only protected by KoBo/Central/Ona/Aggregate if the forms are not public. Whenever Enketo receives a 401 response from Central/KoBo/Ona/Aggregate, when requesting the XForm or form media or external data resources, it redirects or collects credentials (depending on which auth mechanism it is configured to use). (Note that when it uses OpenRosa authentication those credentials are only stored in the user's browser). There is nothing new to be invented here, I think.

The single sign-on experience is supported for the second and third mechanism, but note that this may only be useful for the form administrator, since that is probably the only person that accesses the Central/KoBo/Ona/Aggregate interface.

1 Like

I suspect the date is postponed, isn't it?

Yes, I suspect so as well. We did get an extension from the donor, so that's good.

FYI I think this proposed new feature - ie integrating Enketo with Central - probably passes the bar as to what should be on the ODK roadmap. Perhaps someone on the TSC should add something to the roadmap for it? Perhaps someone on the TSC who also happens to be intimately involved in Enketo... (hint, hint :grin: )

2 Likes

so for what it's worth, we do have these items on the odk central roadmap already. i don't know enough to speak to TSC things.

1 Like

Thanks, good to know. Might still be useful to expose such a significantly/useful new feature on the (more visible?) ODK roadmap, but I'll let @martijnr make that call.

2 Likes

Wow..this went from August 2019 to June 2020.
Is there a specific reason why this was so much postponed?
I see that the integration of Enketo into Central is a key component for many people to switch to Central.
There technical problems or time constraints?
Is it not possible to anticipate this integration?

3 Likes

The Central roadmap is the projected plan assuming @issa and @Matthew_White are the only implementers. If @jnm, @martijnr and/or others are involved, that certainly shifts the timeline. I think we're all motivated to help make this happen as soon as possible.

I agree that it makes sense for these kinds of integration features to be reflected on the ODK roadmap in some way, if just as a table of contents that links to conversations like this one and Central/Enketo roadmap items.

I'm coming back to this after being out for a bit, has any discussion happened anywhere else? @issa, @jnm, any reactions to @martijnr's clarifications ? Is it time to schedule a call for interested parties to make sure all challenges have been identified and that it's clear who is thinking about what?

4 Likes

FYI as promised on TSC call, I've added an item to ODK roadmap to track.

4 Likes

any reactions to @martijnr's clarifications ? Is it time to schedule a call for interested parties to make sure all challenges have been identified and that it's clear who is thinking about what?

not from me. i don't feel like i have any further insight to add until i see some kind of a preliminary plan.

2 Likes

Hello again everyone, and thanks for all of your detailed and enthusiastic responses. My thought with prioritizing "Offline-capable, multi-submission surveys done by enumerators" as the first thing to tackle is that it would actually require minimal work (and hopefully be broadly useful). @issa, I'll outline my understanding below—please let me know if I make any missteps.

ODK Central is already an OpenRosa server supporting one OpenRosa client (ODK Collect), and my approach would be to treat Enketo as simply another OpenRosa client. Where Central seems to differ from other OpenRosa servers is that usernames and passwords are not used at all; instead, each "App User" authenticates with a unique URL made hard-to-guess by containing a 64-byte token.

Just as ODK Collect does, Enketo could interact with unique formList and submission URLs provided by ODK Central for each App User, forgoing usernames and passwords. Enketo would, in turn, generate its own unique, token-containing URLs: one per form per App User. For example, assuming all the forms are in the same project (ID 1):

Form App User Central OpenRosa URL Prefix Enketo URL
Well Pumps Meredith https://central/v1/key/{meredith-random-key}/projects/1/ https://enketo/::{meredith-well-pumps-random-key}
Well Pumps Jorge https://central/v1/key/{jorge-random-key}/projects/1/ https://enketo/::{jorge-well-pumps-random-key}
Well Pumps Ricki https://central/v1/key/{ricki-random-key}/projects/1/ https://enketo/::{ricki-well-pumps-random-key}
Cisterns Ricki https://central/v1/key/{ricki-random-key}/projects/1/ https://enketo/::{ricki-cisterns-random-key}

The workflow would be something like this:

  1. The system administrator of the ODK Central instance configures an Enketo URL and API key for use by the entire instance;
  2. [UX help, please!] ODK Central provides a UI element to retrieve an Enketo data-collection URL. This would be similar to "See code" in the "Configure Client" column of the App Users list, but there must be a way to select a particular form in addition to an App User;
  3. Once an App User and form have been selected, ODK Central POSTs that App User's token-containing OpenRosa URL and the <formID> of the form to the appropriate Enketo endpoint;
  4. Enketo stores the form and the OpenRosa URL in its Redis database, associating them with a unique key;
  5. Enketo returns a URL containing that unique key;
  6. ODK Central displays this URL;
  7. The project manager communicates that URL to the enumerator;
  8. The enumerator uses this URL to enter data and submits an instance;
  9. Enketo receives the submission and forwards it to Central's OpenRosa submission endpoint for this particular enumerator, which was stored in Enketo's Redis DB;
  10. Central recognizes the OpenRosa URL as containing a valid App User token and accepts the submission.

To respond to a few specific questions:

I propose that (at first) ODK Central does not ship Enketo at all. Instead, use of Enketo would be entirely optional, and the administrator of a Central instance could use any Enketo server reachable via HTTPS. Enketo already runs well in a Docker container, so savvy folks could set up Enketo immediately alongside their Central instance. If later there's a desire to include Enketo along with Central in a more streamlined way, I could help with the Docker aspect of that.

Central trusts that it can send blank forms and App User tokens to a particular Enketo that's been explicitly configured by the Central administrator. Central requires the Enketo server to use HTTPS with a certificate from a trusted authority to frustrate MITM and eavesdropping attempts. Central allows an instance to be submitted when it's POSTed to a valid, token-containing OpenRosa submission URL, no matter whether the client is Enketo or ODK Collect.

This sounds like it could be fun, but I'd like to avoid relying on anything but HTTPS over public interfaces. I may want to run my own Central but connect it to someone else's Enketo, or I might need a distributed cluster of Centrals and Enketos using something other than Docker's swarm mode.

Thanks for reading :slight_smile:

3 Likes

From a Central standpoint, the goal is to make Enketo functionality (preview, filling, editing) available with as little additional technical know-how or additional configuration as possible.

As far as code-paths, we can ensure there is not too much difference between bundled vs external if we're careful. After some discussion with @martijnr, it looks like bundled would use an Enketo subpath proxied to the local server in a Docker container. External servers would connect directly over HTTPS.

It’s riskier to bundle Enketo as a starting point and we’ve historically done the riskier stories first. We also aim to provide the best user experience first and fill in advanced features later. That said, I understand that starting with external access might make more sense for @jnm's roadmap/timeline.

I think we should get on a call to talk about the tradeoffs and dig into the proposed workflow. @issa, @LN, and I are available at the following times. @jnm, @martijnr, @Matthew_White which of these times would you be available?

  • Tue, Sep 10, 11 AM - 12 PM PDT
  • Tue, Sep 10, 1 PM - 2 PM PDT
  • Wed, Sep 11, 10 AM - 11 AM PDT
  • Wed, Sep 11, 1 PM - 2 PM PDT

0 voters

5 Likes

I've sent out a calendaring invite to the relevant parties for a call on Tuesday the 10th at 11 am PDT. The call will held at https://www.uberconference.com/opendatakit.

Everyone is welcome and if you are interested and can't make it, no worries, we'll take notes at https://docs.google.com/document/d/16btMIg7fz7rKURz5sZQPrDA83jHkEJXGOlyMvlPOrh8 and the audio will be recorded.

3 Likes