Add a GeoJSON export to Briefcase and Aggregate

Xiphware · September 18, 2018, 9:02pm

My interpretation of the spec is that Features cannot contain properties which are FeatureCollections, otherwise the whole example in the spec around "GeoJSON semantics do not apply to foreign members and their descendants..." seems a bit pointless. But it would be good to get confirmation of this from someone in the GIS community; nested FeatureCollections could solve all our problems, if they're legit.

ggalmazor · September 18, 2018, 9:44pm

I agree with @Xiphware. I did test this and it is not valid GeoJSON according to the linters I've tried (linked in my previous comment). Right now I don't see any better option than having a root FeatureCollection with flat GeometryCollection features in it.

This is in line of what the RFC says:

3.2. Feature Object

A Feature object represents a spatially bounded thing. Every Feature
object is a GeoJSON object no matter where it occurs in a GeoJSON
text.

o A Feature object has a "type" member with the value "Feature".

o A Feature object has a member with the name "geometry". The value
of the geometry member SHALL be either a Geometry object as
defined above or, in the case that the Feature is unlocated, a
JSON null value.

o A Feature object has a member with the name "properties". The
value of the properties member is an object (any JSON object or a
JSON null value).

o If a Feature has a commonly used identifier, that identifier
SHOULD be included as a member of the Feature object with the name
"id", and the value of this member is either a JSON string or
number.

3.3. FeatureCollection Object

A GeoJSON object with the type "FeatureCollection" is a
FeatureCollection object. A FeatureCollection object has a member
with the name "features". The value of "features" is a JSON array.
Each element of the array is a Feature object as defined above. It
is possible for this array to be empty.

This implies that a FeatureCollection is not a Feature, which would prevent to have nested FeatureColections

ggalmazor · September 18, 2018, 9:53pm

o A Feature object has a member with the name "geometry". The value
of the geometry member SHALL be either a Geometry object as
defined above or, in the case that the Feature is unlocated, a
JSON null value.

@Xiphware, would you say we could take advantage of that part about "... or, in the case the Feature is unlocated, a JSON null value" to represent submissions that have no spatial values on them?

Xiphware · September 18, 2018, 9:56pm

Correct. Good catch - I never spotted that! Awesome

Now we dont have to exclude non-georeferenced submissions, which should make @yanokwa happy

Its might be an interesting exercise to see what some of the common GIS tools do when trying to import a GeoJSON feature set containing NULL geometries(!). It might seem to be a very unusual use-case, from a GIS standpoint, so I'd be curious to see if they handle it gracefully...

Ivangayton · September 19, 2018, 7:16am

Hi all,

Sorry I'm late to the party, and nested GeoJSON question has already answered better than I can by @Xiphware and others, so I'll keep that part of my response short:

In my experience, though nested FeatureCollections are not really kosher in the GeoJSON spec (they are treated as foreign members, meaning it's up to whatever is reading it to decide whether or not to interpret it as a first-class feature). They usually work in common tools; I have certainly seen nested FeatureCollections work in QGIS. I would not go this way simply because we can't be sure that even the tools that can manage FeatureCollections within FeatureCollections will behave consistently.

I seem to recall reading that Leaflet might ignore the inner FeatureCollections, but I can't find the citation at the moment. I've never tested that.

Why not create a GIS file with all of the data in it?

I think I share the view that @danbjoseph had : geo-features are most naturally seen as children of the larger dataset rather than the survey data being subordinate to the geo-features.

Of course it's convenient to have a geographical file that you can slap straight into a GIS program or Web map viz tool and see everything right away, but this is only actually straightforward when there's only one, and exactly one, feature per survey (or repeat). The classic case of "We mapped every house and now we want to quickly see a map with the data as a pop-up when we click on each house" is perfectly compatible with a GeoJSON containing everything.

But what about a survey that includes the house and any outbuildings belonging to that family (outdoor toilet, animal stable, etc). What data is included in the toilet feature?

A more subtle case: you're recording patient origins in a hospital, and some patients are able to tell you their District, Chiefdom, Section, and Village name. Others are only able to tell you their District and Chiefdom. So you have geo-features that may be a large polygon (Chiefdom), a small polygon (Section) or a single point (Village). As attributes of a patient, this is fine—you know everybody's Chiefdom but some people's Village is blank—a GIS analyst can deal with this. But if the patient data is an attribute of the Village, you can forget a patient because they aren't part of a Village feature.

I'm sure there are cases where the data-as-attribute-of-feature(s) is more sensible, but I feel like these are the minority of situations, other than the simplest one:

There's certainly a good justification for allowing a non-GIS user to just get a simple KML with the data to look at it in a Web viewer! It would be nice if that were still possible for people who don't know how to do a basic join of a CSV to a GeoJSON by key. This capacity could be retained by keeping a KML export that is basically the CSV turned into a points-only KML with the first GeoPoint column as the coordinates. That may answer @danbjoseph's concern for some external visualisation platforms, some of which will probably always want to do the simplest thing which is simply show points with associated attributes.

But for anything more complicated than that, and certainly anything for which we'd be looking for the power of GeoJSON instead of KML, I think it makes way more sense to target a CSV export and GeoJSON sidecar (with joining keys for surveys, groups, and repeats).

Xiphware · September 19, 2018, 8:16am

@Ivangayton Thank you for your well thought out input. It sounds like we have reached a concensus that, at least for any initial offering of GeoJSON export support in ODK, that it will be the ‘sidecar’ approach.

My only remaining concern, and no offense intended to @ggalmazor, but perhaps we can come up with a better term than “sidecar”?

All those not in favor say ‘nay’?

ggalmazor · September 23, 2018, 11:33am

So almost a week has passed and it seems like no one is violently opposed to the sidecar approach. By the way, I think @yanokwa was the first to suggest that name! I don't have a strong opinion on any name

I think we should touch base with the GeoJSON data structure we want to implement too:

GeometryCollection approach - HARD TO GET IT WORKING

Details

Root FeatureCollection

We will export a root FeatureCollection that will include all spatial data from all submissions of a form
There will be one GeometryCollection per exported submission, with a flat array of Geometry Objects, one per spatial form field, including all repeats
Each GeometryCollection will have a key property with the corresponding submission's UID

The Geometry Objects

All will include a meta Foreign Member with a field key containing the name of the field they belong to
These are the types of Geometry Objects we will use:
- GEOPOINT fields will be represented as Point objects
- GEOTRACE fields will be represented as LineString objects
- GEOSHAPE fields will be represented as Polygon objects
When a submission has no value for a spatial form field, we will assign an empty coordinates array (RFC 3.1.) to denote a null object

Some notes:

I'm still convinced that our users will have to filter data e.g. to show on a map just one spatial field.
- With the data structure I've described, the data query would be something like "take all GeometryCollection objects in the root FeatureCollection. Then, from those, take only the Geometry Object that has a meta.field with value the_geopoint_field_I_want.
- Instead of this, we could skip the GeometryCollection objects and add all spatial fields from all exported submissions directly into the root FeatureCollection. We could add info to their properties map to link them to their corresponding UID and fields.
- This would enable simpler data queries: "take all Features from the root FeatureCollection that have the value the_geopoint_field_I_want in their field property.
- This not only is easier to understand but it also uses standard Feature properties.
We could reduce the output file size by omitting fields that don't have values, but then the structure would be heterogeneous i.e. each GeometryCollection could have a different number of elements in them. I ignore if this would be an issue.

Flat FeatureCollection approach - PROMISING

Details

Root FeatureCollection

We will export a root FeatureCollection that will include all spatial data from all submissions of a form
Spatial data will be encoded as Point, LineString, and Polygon GeoJSON objects
Each GeoJSON object will have the following properties:
- key, containing the instance ID
- field, containing the field name
- empty, containing yes if the submission doesn't have an answer for that field, no otherwise
GEOSHAPE fields with more than 2 points, and coincident first and last points will be encoded as Polygon objects
- A GEOSHAPE with 2 points, or with different first and last points will become a LineString GeoJSON object
- A GEOSHAPE with 1 point will become a Point GeoJSON object
GEOTRACE fields with more than 1 points will be encoded as LineString objects
- A GEOTRACE with 1 point will become a Point GeoJSON object
GEOPOINT fields will be encoded as Point GeoJSON objects
When a submission has no value for a spatial field, it will have the corresponding GeoJSON object type (Point, LineString, or Polygon) and will have a null geometry property

danbjoseph · September 24, 2018, 2:12pm

it doesn't help with file size but i think from a GIS perspective (not necessarily a programming perspective) this is easier to work with.

how does a GeometryCollection work in a GIS software? when you load the features, do the Geometry Objects inherit the attributes/properties of the containing collection?

ggalmazor · September 24, 2018, 2:23pm

That's what I was suspecting.

I've been doing some tech spikes trying things out and so far I've been unable to produce anything that would pass a GeoJSON linter or works in QGIS.

Preliminary results point out that child members of a GeometryCollection don't inherit the parent's properties. I think this could be a stopper for this...

The most promising experiment so far has been to produce a flat FeatureCollection with Features (Points, LineStrings, and Polygons).

ggalmazor · September 24, 2018, 2:43pm

@danbjoseph, @Xiphware, maybe you could test this file: demo.geojson.zip (14.5 KB)

It's the result of one of my tech spikes. All features have an empty property set to yes or no for easy filtering.

ggalmazor · September 24, 2018, 2:49pm

Some extra considerations regarding the XForms geo types and GeoJSON types:

We can't encode as a Polygon GeoJSON object any XForms GEOSHAPE answer if it doesn't have at least 3 points, and the last and first points are the same
We can't encode as a LineString GeoJSON object any XForms GEOTRACE answer if it doesn't have at least 2 points.

This means that we will have to downgrade answers to a compatible representation:

A GEOSHAPE with 2 points, or with different first and last points will become a LineString GeoJSON object
A GEOSHAPE with 1 point will become a Point GeoJSON object
A GEOTRACE with 1 point will become a Point GeoJSON object

danbjoseph · September 24, 2018, 3:34pm

Opens nicely in geojson.io

and QGIS

This makes sense, and seems fine, to me.

Xiphware · September 24, 2018, 11:08pm

Ditto, appears to pass GeoJSON-to-KML conversion OK too:

Xiphware · September 24, 2018, 11:14pm

Technically speaking, according to ODK spec, these are invalid values to begin with, so arguably they should never have made it up in the original submission in the first place.

But I guess there's no harm in having the export fail gracefully, should the impossible happen...

ggalmazor · September 25, 2018, 6:44am

Thanks for trying the file, @danbjoseph, @Xiphware! I guess this is good news, then.

I think we're ready to create an issue and start implementing this.

danbjoseph · September 25, 2018, 4:17pm

Impossible you say? Nothing is impossible when you work for the circus.

Xiphware · September 25, 2018, 11:40pm

Actually, this brings up a point probably worth some discussion... If we are going to export invalid form values by, in effect, changing their datatype to something else so that they now become valid, then - as mentioned by @ggalmazor - we should effectively re-cast a GEOSHAPE where the last point value does not equal the first to a GEOTRACE (!)

On reflection, I'm a bit uneasy about (silently) re-casting invalid values to different datatypes simply to try and make them valid, since that can really change their fundamental semantic and could well provoke unintended (and undesirable?) consequences downstream...

Would it be better to barf the entire export? Or omit the offending instances? Or...?

[Aside: I've never liked this explicit condition on GEOSHAPES; its entirely redundant and unnecessarily increases the amount of data to needed to define a shape. I mean, a triangle has three corners (sic), so why do I have to pass four!?].

ggalmazor · September 26, 2018, 6:24am

One way to avoid downgrading objects would be to add the first element to the tail of the list until we get at least 4 points. This wouldn't effectively change the shape and would make it a valid Polygon. What do you think, @Xiphware, @danbjoseph?

Same would go with LineStrings (until we get 2 points)

Xiphware · September 26, 2018, 6:51am

Fixing a geoshape is pretty easy, and obvious: if the last coordinate aint equal to the first, just append the first to end [completely ignoring @LN's odk:length for the moment... ]

But it doesn't really address the more fundamental problem: when exporting a dataset, what should we do when encountering - as a consequence of actually parsing and translating its contents - an invalid value? eg what should we do if we get a geotrace datatype with only one coordinate? Or a geopoint with a latitude >90? Or, if we going a step further in our export sanity checking, a select_one whose value isnt actually one of the options (assuming we dont support open selections)?

Hopefully you get my drift: (silently) massaging invalid values into, hopefully, valid ones can be a bit of slippery slope... Hence I might be more inclined to just abort the export, or skip submissions containing bad data (and throw an error message alerting the user to fact they have bad data).

ggalmazor · September 26, 2018, 7:17am

I think we're getting into a very interesting and important topic.

My engineer mind agrees with the strategy of failing with some explanation, but I see a big problem with that approach: we are not providing any means for users to ammend their data. In fact, we strongly discourage tampering with contents in the Briefcase Storage location, or Aggregate's database.

I think that there's value on making a "best-effort" approach to get data out of ODK, and we don't have to do it silently. I think it would be a great idea to attach an export report with all the ammends/omissions we've been forced to do.