Add a GeoJSON export to Briefcase and Aggregate

odk-briefcase
odk-aggregate

(Dr. Gareth S. Bestor) #21

@Ivangayton Thank you for your well thought out input. It sounds like we have reached a concensus that, at least for any initial offering of GeoJSON export support in ODK, that it will be the ‘sidecar’ approach.

My only remaining concern, and no offense intended to @ggalmazor, but perhaps we can come up with a better term than “sidecar”? :smile:

All those not in favor say ‘nay’? :slightly_smiling_face:


(Guillermo) #22

So almost a week has passed and it seems like no one is violently opposed to the sidecar approach. By the way, I think @yanokwa was the first to suggest that name! :slight_smile: I don't have a strong opinion on any name :slight_smile:

I think we should touch base with the GeoJSON data structure we want to implement too:

GeometryCollection approach - HARD TO GET IT WORKING :tired_face:

Details

Root FeatureCollection

  • We will export a root FeatureCollection that will include all spatial data from all submissions of a form
  • There will be one GeometryCollection per exported submission, with a flat array of Geometry Objects, one per spatial form field, including all repeats
  • Each GeometryCollection will have a key property with the corresponding submission's UID

The Geometry Objects

  • All will include a meta Foreign Member with a field key containing the name of the field they belong to
  • These are the types of Geometry Objects we will use:
    • GEOPOINT fields will be represented as Point objects
    • GEOTRACE fields will be represented as LineString objects
    • GEOSHAPE fields will be represented as Polygon objects
  • When a submission has no value for a spatial form field, we will assign an empty coordinates array (RFC 3.1.) to denote a null object

Some notes:

  • I'm still convinced that our users will have to filter data e.g. to show on a map just one spatial field.
    • With the data structure I've described, the data query would be something like "take all GeometryCollection objects in the root FeatureCollection. Then, from those, take only the Geometry Object that has a meta.field with value the_geopoint_field_I_want.
    • Instead of this, we could skip the GeometryCollection objects and add all spatial fields from all exported submissions directly into the root FeatureCollection. We could add info to their properties map to link them to their corresponding UID and fields.
    • This would enable simpler data queries: "take all Features from the root FeatureCollection that have the value the_geopoint_field_I_want in their field property.
    • This not only is easier to understand but it also uses standard Feature properties.
  • We could reduce the output file size by omitting fields that don't have values, but then the structure would be heterogeneous i.e. each GeometryCollection could have a different number of elements in them. I ignore if this would be an issue.

Flat FeatureCollection approach - PROMISING :smiley:

Details

Root FeatureCollection

  • We will export a root FeatureCollection that will include all spatial data from all submissions of a form
  • Spatial data will be encoded as Point, LineString, and Polygon GeoJSON objects
  • Each GeoJSON object will have the following properties:
    • key, containing the instance ID
    • field, containing the field name
    • empty, containing yes if the submission doesn't have an answer for that field, no otherwise
  • GEOSHAPE fields with more than 2 points, and coincident first and last points will be encoded as Polygon objects
    • A GEOSHAPE with 2 points, or with different first and last points will become a LineString GeoJSON object
    • A GEOSHAPE with 1 point will become a Point GeoJSON object
  • GEOTRACE fields with more than 1 points will be encoded as LineString objects
    • A GEOTRACE with 1 point will become a Point GeoJSON object
  • GEOPOINT fields will be encoded as Point GeoJSON objects
  • When a submission has no value for a spatial field, it will have the corresponding GeoJSON object type (Point, LineString, or Polygon) and will have a null geometry property

(danbjoseph) #23

it doesn't help with file size but i think from a GIS perspective (not necessarily a programming perspective) this is easier to work with.

how does a GeometryCollection work in a GIS software? when you load the features, do the Geometry Objects inherit the attributes/properties of the containing collection?


(Guillermo) #24

That's what I was suspecting.

I've been doing some tech spikes trying things out and so far I've been unable to produce anything that would pass a GeoJSON linter or works in QGIS.

Preliminary results point out that child members of a GeometryCollection don't inherit the parent's properties. I think this could be a stopper for this...

The most promising experiment so far has been to produce a flat FeatureCollection with Features (Points, LineStrings, and Polygons).


(Guillermo) #25

@danbjoseph, @Xiphware, maybe you could test this file: demo.geojson.zip (14.5 KB)

It's the result of one of my tech spikes. All features have an empty property set to yes or no for easy filtering.


(Guillermo) #26

Some extra considerations regarding the XForms geo types and GeoJSON types:

  • We can't encode as a Polygon GeoJSON object any XForms GEOSHAPE answer if it doesn't have at least 3 points, and the last and first points are the same
  • We can't encode as a LineString GeoJSON object any XForms GEOTRACE answer if it doesn't have at least 2 points.

This means that we will have to downgrade answers to a compatible representation:

  • A GEOSHAPE with 2 points, or with different first and last points will become a LineString GeoJSON object
  • A GEOSHAPE with 1 point will become a Point GeoJSON object
  • A GEOTRACE with 1 point will become a Point GeoJSON object

(danbjoseph) #27

Opens nicely in geojson.io


and QGIS


17%20AM


This makes sense, and seems fine, to me.


(Dr. Gareth S. Bestor) #28

Ditto, appears to pass GeoJSON-to-KML conversion OK too:


(Dr. Gareth S. Bestor) #29

Technically speaking, according to ODK spec, these are invalid values to begin with, so arguably they should never have made it up in the original submission in the first place.

But I guess there's no harm in having the export fail gracefully, should the impossible happen...


(Guillermo) #30

Thanks for trying the file, @danbjoseph, @Xiphware! I guess this is good news, then.

I think we're ready to create an issue and start implementing this.


(danbjoseph) #31

Impossible you say? Nothing is impossible when you work for the circus.


(Dr. Gareth S. Bestor) #32

Actually, this brings up a point probably worth some discussion... If we are going to export invalid form values by, in effect, changing their datatype to something else so that they now become valid, then - as mentioned by @ggalmazor - we should effectively re-cast a GEOSHAPE where the last point value does not equal the first to a GEOTRACE (!)

On reflection, I'm a bit uneasy about (silently) re-casting invalid values to different datatypes simply to try and make them valid, since that can really change their fundamental semantic and could well provoke unintended (and undesirable?) consequences downstream... :worried:

Would it be better to barf the entire export? Or omit the offending instances? Or...?

[Aside: I've never liked this explicit condition on GEOSHAPES; its entirely redundant and unnecessarily increases the amount of data to needed to define a shape. I mean, a triangle has three corners (sic), so why do I have to pass four!?].


(Guillermo) #33

One way to avoid downgrading objects would be to add the first element to the tail of the list until we get at least 4 points. This wouldn't effectively change the shape and would make it a valid Polygon. What do you think, @Xiphware, @danbjoseph?

Same would go with LineStrings (until we get 2 points)


(Dr. Gareth S. Bestor) #34

Fixing a geoshape is pretty easy, and obvious: if the last coordinate aint equal to the first, just append the first to end [completely ignoring @LN's odk:length for the moment... :stuck_out_tongue_winking_eye: ]

But it doesn't really address the more fundamental problem: when exporting a dataset, what should we do when encountering - as a consequence of actually parsing and translating its contents - an invalid value? eg what should we do if we get a geotrace datatype with only one coordinate? Or a geopoint with a latitude >90? Or, if we going a step further in our export sanity checking, a select_one whose value isnt actually one of the options (assuming we dont support open selections)?

Hopefully you get my drift: (silently) massaging invalid values into, hopefully, valid ones can be a bit of slippery slope... :slight_smile: Hence I might be more inclined to just abort the export, or skip submissions containing bad data (and throw an error message alerting the user to fact they have bad data).


(Guillermo) #35

I think we're getting into a very interesting and important topic.

My engineer mind agrees with the strategy of failing with some explanation, but I see a big problem with that approach: we are not providing any means for users to ammend their data. In fact, we strongly discourage tampering with contents in the Briefcase Storage location, or Aggregate's database.

I think that there's value on making a "best-effort" approach to get data out of ODK, and we don't have to do it silently. I think it would be a great idea to attach an export report with all the ammends/omissions we've been forced to do.


(Dr. Gareth S. Bestor) #36

That's a very good point!

How 'bout we kick it back to the feature originator @yanokwa, and let 'im (ie "as a data manager") perhaps make provide wisdom on a decision-that-can-never-please-everybody: attempt to fix bad export data, vs skip it, vs fail export? Or some (unholy) combination thereof!

:grin:


(danbjoseph) #37

I think it would be good for ODK developers to somehow find out if invalid geometries are being created by geo questions, so that it can be fixed at the source. I'm not sure the export process should "fix" any invalid data.

We could kick any invalid features to null island and set the coordinates to a point at 0,0 or just set the geometry to null and move the geometry to within properties (maybe prepending invalid- or something)? This way the user see what didn't work but we don't really modify the data. Something like:

{
  "type": "FeatureCollection",
  "features": [
    {
      "type": "Feature",
      "properties": {
        "field": "myGeoQuestion",
        "key":  "uuid:09fu-je89f249-8hf4jf-0834",
        "invalid-geometry": {
          "type": "Polygon",
          "coordinates": [
            [
              [-88.52783203125,35.460669951495305],
              [-87.275390625,35.460669951495305]
            ]
          ]
        }
      },
      "geometry": {
        "type": "Point",
        "coordinates": [0,0]
      }
    }
  ]
}

(Mathieubossaert) #38

Hi to all and thanks Yaw for the invitation.

I was far from office and I can see that you had a great discussion about this great feature.
This will be a great sidecar, first version, of the feature.


(Yaw Anokwa) #39

Collect won't let you have these bad data types. In the same way you can't have a latitude > 90, you can't have a geotrace that's just one point.

My preference would be to skip the malformed submission and alert the user. If we get user feedback that they want something else, then we can add it.


(Dr. Gareth S. Bestor) #40

Agreed. My preference would be to skip exporting any data we've determined - by whatever means - to be 'bad'. This is a lot safer than trying to massage it to be 'less bad', and it will be less a show-stopper than aborting the entire export.

How best alert the user can probably be dealt with separately; at a minimum, the number of submissions after an export will be less than number before (ie the count in the DB), so at least there's a reliable indication that something is probably 'bad' somehwere...