Make GeoShapeData / GeoPointData more flexible to include additional geo data

initial post on javarosa issue

This is a request to discuss adding some flexibility on the GeoShape/GeoPointData format. The current format is -6.668710000000001 106.91277 821.2 1.7; <another point>; <and another point>.

We would like javarosa to allow additional data to be attached for each point so that the points could carry more contextual data on how they were recorded while still supporting the area() or distance() functions. That additional information would enable advanced surveying capabilities within ODK collect.

The contextual data should be rather flexible with minimum constraint to let each application come up with a best fit format. To give an example of those data in the context of our work at meridia, there is a nonexhaustive list:

  • a uuid v4 generated when the point was recorded
  • the method used to record the point: tablet gps, bluetooth gps, gps with rtk capabilities, drawn or snapped to an existing point...
  • (depending on the method) the uuid of the point it was snapped to
  • (depending on the method) an attached raw gps file that was downloaded from the gps device
    ...
    Currently, our internal format is to add a json dictionary at the end of each point such as:
    -6.668710000000001 106.91277 821.2 1.7 {"uuid": "12344....", "method": "rtk", ...};...
    Focusing on JavaRosa modification, our suggestion would be to have the GeoPointData accept the following format <double> <double> <double*> <double*> <string*> where only the 2 first double are mandatory. Altitude, accuracy and the additional string would be optional.

In the coming weeks, we will propose an implementation of this modification via a PR. In the mean time feel free to feedback.

@yanokwa suggested an alternative format such as GeoJson could solve the problem.

To my knowledge, there is not textual simple open standard out there which allow additional data per point in the context of a polygon.

On the GeoJson, it's quite limited as one can only float/double in the coordinates. here is an extract of the GeoJson spec

A position is an array of numbers. There MUST be two or more
elements. The first two elements are longitude and latitude, or
easting and northing, precisely in that order and using decimal
numbers. Altitude or elevation MAY be included as an optional third
element.

Implementations SHOULD NOT extend positions beyond three elements
because the semantics of extra elements are unspecified and
ambiguous. Historically, some implementations have used a fourth
element to carry a linear referencing measure (sometimes denoted as
"M") or a numerical timestamp, but in most situations a parser will
not be able to properly interpret these values. The interpretation
and meaning of additional elements is beyond the scope of this
specification, and additional elements MAY be ignored by parsers.

Our current GeoShape/GeoPoint format is pretty terrible (or rather non-standard) and putting more data in it would make it worse. And that's not even considering the downstream effects. For example, pipelines that are expecting relatively small number of characters (e.g., for GeoPoint) now have to deal with strings of arbitrary length.

My initial reaction is that GeoJSON or the binary data widget would be better.

Use GeoJSON
I suggested GeoJSON because it solves the problem of the non-standard formats, but we've extended it a bit for Briefcase v1.13's export.

{
    "geometry": {
        "coordinates": [ 174.76902833333332, -41.267264999999995, 192.6 ],
        "type": "Point"
    },
    "properties": {
        "empty": "no",
        "field": "geopoint_widget_maps",
        "key": "uuid:0e910762-7ffd-4d9c-bb2b-9bab1455c44c"
    },
    "type": "Feature"
},

More at Add a GeoJSON export to Briefcase and Aggregate - #25 by ggalmazor.

Use the binary data widget
The XForms specification already supports a binary data type (we use this in with the file upload). You could have a binary question associated with your location questions that would have whatever data you want to include.

How this manifests in Collect might be a bit tricky as far as figuring out what the UI will look like, but the fundamentals are there.

I'm inclined to agree. There's little about the current ODK format(s) for capturing geospatial data that is particularly compelling, other than perhaps their genealogy; to use the captured data usefully elsewhere requires translation. But rather than making this legacy format arguably even 'worse' [relative to contemporary and more widely utilized GIS formats] by adding further (ad hoc) additions, I might also suggest instead adding better native support for alternate representations, eg GeoJSON or WKT, to ODK might be a better path forward, still retaining the current geo* format(s) for legacy support.

In the case of a single point, the Geojson could work as you describe @yanokwa. However GeoJson becomes cumbersome with more than one point geometries like Polygon or Line, not to talk about Multi-polygon or Multi-line. The issue is that properties are expressed for an entire geometry instead of per point. So basically ones could end up with additional point_details with same indexing than the coordinates.

{
     "geometry": {
         "coordinates": [ [ 174.76902833333332, -41.267264999999995, 192.6 ],[ 174.76902833333332, -41.267264999999995, 192.6 ], ... ],
         "type": "Polygon"
     },
     "properties": {
         "point_details": [{ "key": "uuid:0e910762-7ffd-4d9c-bb2b-9bab1455c44c", "field": "..."}, { "key": "uuid:01b26a7b-1df5-4816-943a-f2f7e3e607bb", "field": "..."}, ....],
     },
     "type": "Feature"
 },

This sounds like the core of the problem. If there is no established geospatial format for representing the additional per-point metadata desired, then perhaps this is something that should be discussed and addressed in a suitable GIS standards body (eg OGC?). Then, whatever comes out of that can be considered for adoption by ODK.

Adding these extensions to the existing geopoint/geoshape/geotrace formats used by ODK would require rewriting all existing tooling around these data types; eg rewriting area(), distance(), rewriting Aggregate's GeoJSON export function, etc, not to mention breaking any 3rd party downstream ODK tooling that processes current geo* formatted geospatial data. This doesn't really make sense; rather, to support this natively within ODK, it would probably make more sense to simply instead introduce new datatypes [requiring implementing associated new Collect widgets to generate geospatial data in the appropriate format. But then you'd be having to rewrite the existing Collect geo* widgets in any case]. And if you are going to the trouble of adding new native datatypes, I'd probably suggest going with one based on something with better interoperability (KML, GeoJSON, etc)

Just my $0.02.

I liked the discussion, it's very interesting with very good argumentation :+1:

That might be a good workaround (until a new format emerge), as the widget could output 2 data

  1. a string with the classic geotrace so that area(), distance() could run
  2. a binary data with a full extended geometry

Good point, could be a good long term investment/strategy.

It make sense to support. My first guess would be to implement GeoJson data type but i need to investigate a bit: In which part of the code would the new datatype be added? same as the JavaRosa's GeoShapeData/GeoPointData

1 Like

Probably a good start is grep for "geopoint", "geoshape", "geotrace", and everywhere you get a hit there's probably a good chance you'll have to replicate! :slight_smile: Also, you'll need to implement an equivalent new widget for Collect to output captured GPS data in the new format for your geopoint/geoshape/geotrace doppleganger, together with whatever associated new metadata it requires [or, alternatively, edit the existing Collect widgets to detect its binding's type and change its output accordingly.

Unfortunately, adding a brand new datatype isn't trivial (times 3!). :disappointed: :disappointed: :disappointed: But I think in this case it may be the best (only?) option that will avoid breaking existing ODK geo* function all over the place.

Yup. Before throwing your weight behind extending anything in ODK, I might suggest seeking some feedback and advice from a GIS community as to your particular usecase and what folks might recommend in terms of formats to best go about it, and perhaps try to build a concensus around a suitable format (that extends a more established geospatial data format than goepoint/geoshape/geotrace...).

Failing all that, and if you desperately just need an ODK workaround/hack, then another option might be to make your own custom fork of Collect and maybe add a new appearance to tell the geo* widgets to spit out geopoint/geoshape/geotrace strings with your custom additions (in whatever format you like, since its basically all your own one-off custom build...). I dont think the changes you would need to make to ensure area() and distance() (in https://github.com/opendatakit/javarosa/blob/master/src/org/javarosa/core/util/GeoUtils.java) still work would be too great because they are already ignoring the 3rd (altitude) and 4th (accuracy) elements of each geopoint, so ignoring more would be a trivial change. And since presumably you are the only folks consuming this custom format downstream, only you'd need to make sure to handle the extra data in your tooling.

Just throwing some ideas out there... Unfortunately, I do think this is probably too much an isolated usecase, in general, with too great a potential impact on existing ODK functions and consumers, for any changes (to geopoint, geoshape, geotrace format) to likely make it into the core ODK codebase. But again, that's just my $0.02.

We forked ODK and javarosa over the last 2 years and it's working fine except that this option doesn't create synergies. ODK already allows external app to be called for the specific widget job through intent. This incentive companies like us to focus on building advanced features (Intellectual Property) that are contributing to the ODK ecosystem and contribute back to the open source project.

The use case is indeed very specific but it will be quite a good strategical move for ODK to have a more standardized and more flexible internal geo model. There is currently strong demand in many developing countries to provide more advanced mapping capabilities (at a low cost) either for monitoring natural resources or simply to survey and document land ownership. Doing those efficiently involve displaying multiple layers of data and linking the new points with some context. So only supporting coordinates-only geo model won't cut it.

In addition here is a list of usecase where the data collection would be much simpler if the map would allow to take more than one geometry and more contextual data:

  • mapping a parcel, then within this parcel adding points for all the trees with their ages
  • mapping a parcel and for each segment providing the name of the neighbours
  • mapping a forest which as one or multiple ares where it had been recently cleared