And thanks to you for picking it @LN!
A table of contents would definitely be a good idea. So far I've made two presentations to the TSC:
the first was based on this set of slides (some of the ideas here have been superseded by other discussions, but hopefully it's a good, albeit basic, introduction)
the second was based on the user stories doc that you linked to
There have also been a couple of forum threads:
I'm sure there are plenty more threads that have touched on the topic, so perhaps we could build this up together? I'm not sure what the best format would be... In a separate thread?
Meanwhile, let me see if I can answer some of the important questions that you've raised here. Please bear in mind that these answers only reflect the way that I've been arranging my thoughts on the topic, and I'm very happy to be convinced that I'm wrong! I've been working on a spec that I was hoping to distribute prior to the Convening, so that we could use it as a starting point for the discussions there; maybe it would be good if I put that WIP into a Google doc and share it here so that you and others can chime in as I work on it (or would you prefer some other way of collaborating on it? I'm open to any suggestions).
In particular, the user stories document that was generated some time ago provides helpful context for this thread. I would also appreciate a sense of who participated in which conversations and what the conversation/document's status is. For example, I was on maternity leave when the user stories reached their current state and so I don’t have a good sense of how much discussion happened around them and what conclusions were drawn (e.g. how the grayed out stories became grayed out).
As I said, I presented these user stories to the TSC, and during the discussion we added, edited and removed some of the stories. At the end of the session, there was general agreement that the stories reflected everyone's sense of what longitudinal functionality should look like in ODK. In terms of the greying out, we were trying to be pragmatic about what a reasonable MVP could consist of. There are some greyed stories that might be controversial, e.g. the ability to view previous filled forms for a given entity - this is clearly desirable, but we concluded that it would represent a not inconsiderable effort that could be postponed to a v2.0 (or a v1.1, or whatever....)
I would find it very helpful to see more detail from an idealized user experience perspective. How project managers will set up projects, how analysts will view data, how enumerators will pick entities, how conflicts will be resolved will all impact the spec design. Has that been done somewhere?
In my head, mostly
- “As an administrator I want to be able to designate a particular form (“patient form”) as a source of entities for a record form (“visit form”)”
- Does the user need to enter a server-side “mode” distinct from the current disconnected forms mode? Do they need to designate which field in each of these forms should be the key or is that done in the form design (as this conversation assumes)?
- Is “entity” a concept explicitly surfaced to the administrator? Is it a separate concept from “project”?
I envisage that this process would be strongly tied to the concept of a "project", as is currently used in Central. The workflow I have in mind is looks something like this:
The administrator creates a new project on Central (or another server that implements longitudinal data collection - from now I'm just going to talk about Central, but obviously I'm not excluding other implementations)
The administrator is prompted to specify whether this is a longitudinal project
If the administrator specifies that this is a longitudinal project, the first thing they will need to do is specify the entity type. This can be done in one of three ways:
a) upload a new form, e.g. a form for patient details
b) select an existing form that is already on the server, currently in a different project
c) provide a CSV/XLS file, where each row defines an entity (maybe this isn't a 1.0 feature, but I think it would be important that entities can come from sources outside of ODK)
In case (c.), the administrator will be asked to designate which field to use as the key. In the other cases, I would suggest that we use the
instanceId, or even add another metadata field to the xform spec, so that we can make this transparent to the user. But in any case, we also need to ask the administrator which field(s) should be shown in the preliminary entity choosing widget.
The administrator will then be able to upload one of more longitudinal data collection forms that reference this entity type. These forms can reference the entity about which they are collecting data using some as yet undefined syntax. They will not need to specify a
select_one in their form to choose the entity; this "entity choosing widget" should be handled automatically by Collect or any other implementing client application. The form should just assume that a selection has already been made before data entry begins.
- “As an administrator I want to be able to view records by entity”
- Are viewing by entity and by form both possible?
Yes, I think that this would be important. As a data analyst I would like to see both (a) all data that has been collected about village X (and note that this might be data from more than one longitudinal form...), and (b) all data that was collected using a particular longitudinal form (in which case each record should link back to the entity that "owns" it)
- How are different values collected during different encounters for the same property represented on view/export? Is only the latest provided? Are they all made available?
When viewing as a data analyst, I might say "show me all the seasonal growth data that has been collected for tree X"; this would then show a number of records that all come from enumerators filling in the same longitudinal form at different points in time. (So here I would be "viewing by entity"). I can then analyse the tree's progress over time.
For export, I would say that this should be configurable: maybe I want the latest reported height of all the trees I have records on; maybe I want all the reported heights of all the trees; maybe I want to see the growth of all the trees in separate graphs.
- What happens if two enumerators create the same entity while offline? (the de-duping story is grayed out but something has to happen)
It's greyed out because we agreed that this was a piece of functionality that could be omitted in the MVP; we certainly didn't intend that it would never be handled. It's a difficult question, but in the past I've done semi-automatic server-side deduping using matching algorithms that then present possible duplicates to an administrator who can then decide whether the entities should be merged; if so any key references in longitudinal records are updated accordingly.
My thinking is that this would then cause a new entity list to be generated, which - since the entity list will be something like an External Secondary Instance - will then cause Collect to show that a form update is available, so that enumerators can then download the deduped version (I'm aware that I'm eliding a lot of details in that sentence...)
- Is there a difference made between a correction to a mistake made about a fixed property and a new value for a changing property?
If I've understood you correctly, then by "fixed property" you mean a field on an entity, while by "changing property" you mean a field in a longitudinal form. In which case, yes, there is definitely a difference. This question also uncovers several more:
Is it possible for an enumerator to update an entity? In the simplest possible case (MVP! - although this probably doesn't satisfy "viable") entities would be read only. OTOH it's easy to imagine a scenario where an enumerator selects and entity before filling a longitudinal form, and then realises that there is an error in the entity data. Maybe it should be possible to link to the entity form from within the longitudinal form so that she can correct the entity error ("Wait a minute, this isn't a beech, it's an oak!") before starting on the longitudinal report? (And this is where we intersect with the "Linked/Sub-forms" topic that has been discussed elsewhere)
What happens when the entity form is the same as the longitudinal form? This is a particular definition of longitudinal data collection that often comes up, where people just want to keep filling the same form about the same thing over time, without having a separate, originating entity. I would maintain that it should always be possible to separate out something immutable that could be used as an entity definition (e.g. the geopoint of the tree; a birth date), but I think that this is something that needs more discussion
- “As an enumerator, I want to be able to select an entity from a list before I begin a record form”
- What will the enumerator see in order to make that choice (e.g. just the entity ID? I believe an “identifying fields” concept has previously been mentioned)
I kinda covered this above, but there's a lot more that needs defining w/r/t the "preliminary entity choosing widget"
- Is only one kind of entity available at a time or does the enumerator first have to pick an entity type?
I don't think the enumerator shouldn't be able to pick an entity type - we don't want people inadvertently using an arboreal seasonal growth report for village populations, for example.
- Is picking an entity a different client “mode” than picking a blank form to fill?
If you want to fill a longitudinal form, you must begin by choosing the entity on which you are reporting, as mentioned above ("the preliminary entity choosing widget"). So - again, as I envisage it - first the enumerator chooses the form; if it's a longitudinal form, they then have to select an entity; data collection can then commence, as usual.
To more explicitly tie these back to the question that started this thread -- where is the “entity” concept visible to the form designer, the project manager, the data manager and the enumerator?
The form designer has some new syntax that allows them to reference fields from the entity within test, labels, skip logic, calculations, etc. within their form
The project manager has to identify the source of the entities (I'm calling this source the "entity type"; I think that an xform or a CSV structure is an adequate "schema" that defines the "type" of the entity) before they can create a longitudinal project.
The data manager can display and "slice" the data via the entity dimension, should they wish to.
The enumerator has to choose an entity on which they are reporting before they can start filling a longitudinal form.
Phew! I hope all of that makes sense and, as I said at the beginning, this is just my vision for how all of this could work; I'm not saying this is how it has to be (although I do think that it's coherent, covers most use cases and doesn't reinvent too many wheels). I'm looking forward to hearing your thoughts, and figuring out how we can move things forward in a meaningful way.