Enable Case Management/Preloading


Hello All :slight_smile:
we are doing case Management in a complicated way because pulldata is always calles after data was changed.

some ideas to solve it:

  • allow variables in "Default" field, so you can pulldata to variable and not in the field itself
  • allow a if without an else or an else that is linked to the actual field.
  • Switch off the second call of calculation field with a command in calculation

What do you think about this?

Preload data on tablet from dynamic CSV file
Remembering previously entered value in ODK Collect
Changing the CSV file dynamically
Remembering previously entered value in ODK Collect
A dynamic form data lookup locally without server synchronization
Community Visibility/Tooling

Hello ODK Team,
just wondering if you think it's possible to implement...

Especially the If-Condition without an else.
So Case Management could be done by:

  • preloading the Data in the Lable field
  • if the text-input field is empty (length==0) -> pulldata to the text-field, else without a command/no calculation done

This way data would only be pulled if it doesn't change. This would be a great feature that yould enable the Case-Management stuff.

(Hélène Martin) #3

Hi @Mic, I'm afraid I don't fully understand your specific technical suggestions. Case management can mean different things to different people and it would be great to get some specific examples from you and others of what you are trying to do. Perhaps we can start by putting the "how" aside and focus on what you could do in an ideal world.

In particular, it would be helpful to know:

  • the domain of the data you are collecting (health, tree conservation, ...)
  • do you need to fill the same form about the same entity (person, tree, etc) or different forms?
  • what is the workflow you use to assign specific forms or entities to enumerators?
  • at what frequency is new data collected?
  • does more than one enumerator collect information about the same entity?

If you can attach a form demonstrating the complicated strategy you are currently using, that would be very helpful. For anyone else interested in some kind of more refined preloading, it would be really helpful to get your use cases and answers to the questions above as well.

cc @Vanubhav @Snvssh4a2017 because I think Automatic default selection is similar


Hello Hélène,
Sorry for late Reply, just returned from my summer holiday.
I implemented ODK for a NGO that is supporting refugees in Lebanon with various activities like School-Bus for children, Teaching in Embedded community Centers, aid Distribution and also case Management for families that Need Special Support because of medical, education, financial or whatever reason.

We have 4 productive devices and 1 for testing/development... so it´s a small scale so far. Aggregate runs over a Google Appspot (GSuite... the professional account that ensures privacy) and we use a Device Management. We have 2 different GSuite Accounts, one with very limeted Access for Administration and Storage of private data as FusionTable in the Google Drive. The Devices are registered on another account where only the required data for preloading (.csv Files) is stored on the drive as anonymized as possible.

There are 3 forms productive so far:

  • Event Reporting: if there is a Distribution, an education Event or so we make a GPS-Pin, Description, Sponsor if somebody donated for this, type of Event, Distribution to a case is also an Event. We preload some Event Types and other Select fields fo make it configurable. And we preload a list of Cases and Settlements. That works well because the preloaded values don´t Need to be modified.

  • Settlement Assessment: We go through the Settlement, collect names of the Families, Needs, Infrastructure like if they have a toilet, We can select which aid they Need and which Projects are possible in this Settlement. We Preload Settlement "Names" (P-Codes defined by UNHCR) to have data consistency and some select-fields. We do not preload personal data as this changes too often so we make an assessment always from a more or less empty form. This is working well.

  • Case Assessment: This consists of questions in the fields: Health, WASH (Hygiene), Shelter.... to calculate a vulnerability score and to coordinate which aid this case needs. First this is done on an empty form, then over the time we Support this case (Family) we Need to do at least 2 Follow-Ups where we go through the questions and modify if needed. At least on a monthly basis. And for this we Need a preload that can be modified.
    We want to use the same form for "new cases" and "Follow-Up" as it is the same data and shouls be stored at the same place. so it would be the same form. Steps to solve the case like distributions or medical Support will be tracked with the "Event Report" form, we can link it afterwards with FusionTables.

Now the Problem with the pulldata in "calculate" is that it´s called twice, fist when opening the form which is absolutely fine and second when the form is finished. with the second call it overwrites the edited values.
In an ideal world we could call if-functions and pulldata in the "Default" value field in the XLS-Form. Another Option would be to have a flag that disables the second call of the "calculate" field when finished.

Still the data can be locally stored and manually updated, so it does not Need a direct Connection to the Database.

Puh, i hope this made more clearly what is needed, thanks for reading and thinking!

(Yaw Anokwa) #5

This has been a long time coming, but the ODK 1 TSC has started work on specifying case management!

@adam.butler presented his vision for this feature in this slide deck and over the next few weeks, he'll be writing up a spec that will guide at the implementation.

(Adam Butler) #6

@aurdipas and I have just been emailing about some of the details of the process, especially de-duping entities, and I figured it would be good to continue the discussion here and get some more ideas and contributions.

A quick summary of the slides:

  • "case management" is taken to mean (a) defining entities and then (b) making multiple temporally distinct reports on those entities
  • this is implemented using two forms, Form A which stores responses as entities, and Form B which requires that an entity is selected from a list before it can be filled; the response to Form B includes the UUID of the relevant entity

@aurdipas had two good questions:

  1. How do you transfer the already existing entity to the device?
  2. How you can avoid that the same entity is not captured on a second device (duplication)?

These are the answers that I gave, but I'd love to hear peoples' thoughts:

  1. I think we would use the kind of CSV preloading that is already available for options. It would probably also make sense to extend this so that is uses the mechanism as the recent form update notifications, so that there is a reasonable guarantee that devices have the complete entity list.

  2. The auto-updates would go some way to resolving the duplication issue, but is obviously not a satisfactory solution. Probably it would make sense to build some duplication detection and resolution into ODK Central. Ideally, it would only possible to do data collection on entities that have come from Central, so that they will always have to go through this de-duping, but this is obviously not acceptable if I want to register a patient and then make a case report on them in a totally offline setting. I could see a possible solution using a kind of tombstone for de-duped entities, so that a process might look like this:

  • while offline, I register patient dd6c32a4 using Form A
  • dd6c32a4 is now marked as "pending" on my device, which means I can submit case reports against it, but it's not on Central
  • I then do a case report on dd6c32a4 using Form B
  • when eventually online, I submit both to Central
  • it turns out that patient dd6c32a4 is an exact duplicate of an existing patient, 19f44a40, who already has case reports
  • (more details about how exactly de-duping works here)
  • my case report is switched to refer to the existing patient, 19f44a40
  • patient dd6c32a4 is replaced in Central with a tombstone that refers to 19f44a40
  • all incoming case reports for dd6c32a4 will be switched to refer to 19f44a40
  • once my device has updated its entity list, I will no longer be able to make a case report against dd6c32a4

For the specifics of the de-duping process, I would probably use a combination of approaches. First you need to find possible matches, probably using an n-gram algorithm (or possible Levenshtein distances) on identifying fields such as name, village, etc. This is then combined with matches on other fields (e.g. date of birth or geopoint) to calculate a similarity score. You can then figure out values and say something like "if it's over 95%, just merge them automatically" and "if it's over 80%, flag them as probable dupes", and provide a simple interface that displays the data with yes/no buttons. I've done something like this for de-duping patient lists in DRC and it worked pretty well.

Another thing that @aurdipas suggested is that you could check through a list of entities before registering a new one with Form A, to make sure that the person/village/tree you're about to register doesn't already exist in the database, which is a good idea.

Any thoughts?

(Cesar Camacho) #7

Our idea to contribute to the development and implementation of the module
is that in the ODK COLLECT a POST request is added to the Aggregate server
by means of which the consultation methods are defined (Code QR, Code Barras and by ID)
and when selecting any of them, this allows access to the database and
receive the information that you want to consult the module, then select the form
new that has relationship and that this is autocomplete with the information required in this form B

(Adam Butler) #8

Thanks for this suggestion @Controller_Cercafe!

So just to make sure I've understood you correctly, you're proposing that instead of preloading lists of entities (patients, villages, trees, etc.), we add an endpoint to Aggregate/Central that takes an ID (possibly encoded as QR/bar code) and returns an entity, which can then be directly used in Form B?

This is a good idea, and would solve the problem of having a potentially very large number of entities stored on the device. OTOH it would require that the device is online, which is not always the case. I would propose that we add this in a second phase, once the basic functionality is working - how does that sound?

(Cesar Camacho) #9

It reads very promising, in essence if it is what is required, the only "problem" we see is that the internet would be necessary to do it, and the offline one is required for the task where it will be implemented ...

On the other hand, I do not know what you mean by "second phase"

We remain attentive, greetings

(Adam Butler) #10

I mean that it would be in the "1.1" version of this feature, rather than the "1.0".

(Adam Butler) #11

I've written up a more detailed (although still incomplete) spec here: https://github.com/opendatakit/roadmap/issues/23

@TSC sorry this is a bit last minute, but it would be helpful to read it through before today's call if you have time

(Cesar Camacho) #12

Hi guys how are you going ... I would like to know what has happened with what was raised in this discussion, if there is any progress, if we can help something

We want to move forward, we want to help, we want to proceed, we need directions to do it

Stay tuned


(Florian May) #13

Hi all,

this is related to a use case we have:

  • Turtle nesting beaches are surveyed, turtle tracks and nests are recorded by multiple teams using multiple devices. The nesting beaches are in several geographically separated locations.
  • Turtle nests will eventually hatch. We want to follow up and re-visit some of the nests.
  • Nests are marked with a stake carrying a unique ID. The ID is recorded every time we encounter the nest. The data warehouse ingesting the data from ODK Aggregate can filter nests by ID, so we get a full life history for each nest.
  • A user visiting a beach will want to know all recorded turtle nests for only that beach (there are thousands of records on other beaches, possibly too many for one device to download/store).
    The users only want to navigate back to geolocations of existing nests, or see which existing nests are near - their own location.
  • As each encounter with a nest, even if already recorded earlier, is a new record (ODK form), the users want to view, but don't need to edit, the existing records. Viewing existing records doesn't even have to occur in ODK Collect.
  • We would like to have an offline-capable, following mapping app showing some background rasters (aerial image), some user-supplied vectors (e.g. place markers, administrative boundaries, place names etc), and all previously recorded nests on a given location.

A possible data flow could look like this:

  • All data collection devices upload to one ODK Aggregate server. As we want to access data from multiple devices, ODK-A is the first data container which holds all the required data.
  • There is one form for turtle track/nest encounters.
  • There should be one export of that form per location (filtered to that location) into a format like KML/GeoJSON.
  • Users should have an offline-capable mapping app on their data collection devices (7" tablets or larger).
  • There should be a user-friendly way for users to sync data from ODK Aggregate and the other background data to their devices.

I have built a proof of concept using the offline-capable mapping app MapIt and a data warehouse (which ingests ODK Aggregate and offers an API), docs are here. However, this process involves opening a bookmarked API URL behind basicauth (returning GeoJSON), saving the resulting JSON to a local directory, and deleting/recreating a layer in MapIt.

A similar use case is asset management - a ranger visits existing inventory (benches, bbqs, toilets, shelters, displays and signs) in a national park to record its presence, any maintenance needed (which again causes a follow-up visit by a ranger with a paint bucket searching the park bench that needs painting), and scanning the bar code on the asset label.

(Daniel Maina Nderitu) #14

This feature is needed so much especially by research organizations:
Case in point:

  1. You go out in the field as a group to recruit patients for some health conditions and upload this data
  2. Some turn out to be eligible based on inclusion criteria and thus require follow-up at a later date
  3. Each member of the team needs to access this collected data later to fill out a followup and linking should be automated such that no one needs to key in the linking key manually.

Thanks in advance.