Extract FQD path of blob from Aggregate

TerryMud · April 12, 2019, 2:23pm

Working with images stored in Aggregate
I am running ODK-Aggregate-v1.5.0-Linux-x64.run on PostgreSQL10.x Ubuntu 14:04 / 16:04 tomcat8.

When pulling from briefcase or export forms to csv I get a path for the images i.e. http://server:8080/project_name/view/binaryData?blobKey=FORM_A[%40version%3Dnull+and+%40uiVersion%3Dnull]%2FPROJECT_FORM_A_points[%40key%3Duuid%3A971977f8-537c-43b0-8406-d7ac4e02175e]%2Fimage_a
which when used in html I can display and amend display through CSS codes in QGIS Atlases, Reports and Layouts.
I have developed a method for myself which runs SQL scripts through CRON jobs on the aggregate schema which does the prep and manipulates of the field collected data and then moves it to public schema for use in QGIS and other front end tools... This runs really well and keeps the ODK data separate and constantly updates the layers in the databases public schema both spatial and non-spatial data making it current or as near real time as possible. WHY? a big saving is space as the image is only stored once in the odk schema but referenced many times for visualization. It reduces manual tasks in my workflow of downloading csv files and briefcase malarkey. The missing link to full automation is how to get the path as demonstrated above from the aggregate side. I'm not clever enough to reverse engineer the path as used on the aggregate site on 8080, but I know its available, I just need help to get it out so I can include the path in the tuple for referencing

PLEASE HELP

yanokwa · April 17, 2019, 10:48pm

I'm not sure I understand what you mean by "from the Aggregate side", but it sounds like you want the media URLs for each submission?

The API that Briefcase uses is documented here. You first have to get the list of submissions, then get each submission, then in that response, you can find the downloadUrl for the media files.

You could also use the JSON publisher and have a listener/server that grabs the links from each submission. I don't love this option because if the listener is down, then you miss a submission.

You could also consider tweaking Briefcase's command line mode to write the links instead of the image files.

ggalmazor · April 18, 2019, 5:55am

Hi, @TerryMud!

Aggregate uses the SubmissionKey class to encode xpath-style identification of server-side data. You could buld one following these guidelines, if you want:

First, let's take a look to the actual submission key for blobs:

blobKey=FORM_A[%40version%3Dnull+and+%40uiVersion%3Dnull]%2FPROJECT_FORM_A_points[%40key%3Duuid%3A971977f8-537c-43b0-8406-d7ac4e02175e]%2Fimage_a

becomes:

blobKey=FORM_A[@version=null and @uiVersion=null]/PROJECT_FORM_A_points[@key=uuid:971977f8-537c-43b0-8406-d7ac4e02175e]/image_a

You decode it (this website is great for that).

As you can see, the blobKey is composed of different parts:

The form's name FORM_A
Some version information that looks like it's irrelevant (nul versions)
The top level group name that holds your image field (or the form's name if the field is outside any group) PROJECT_FORM_A_points
The UID key of the top level group entry that holds your image
The image field's name image_a

Of course, this submission key structure can vary depending on the form's structure, but you should be able to identify its parts the same way once you url-decode whatever you see on the links Aggregate produces when exporting to CSV your forms, or by inspecting the image urls.

TerryMud · April 18, 2019, 7:15am

Thanks very much, I was discussing it with my brother yesterday and we were inspecting it through google its a good tool to have, and it all suddenly dawned on me the only part that is dynamic to the record from the form is the uuid, the string for the rest is standardized as you have noted above, and all related to the form. So in my select query I concatenated the sting around the sub-string of the uuid and hey bingo my schema for odk_prod is now processed through sql's via cron jobs and updates my public schema with no manual processes in the workflow. My data in QGIS, GeoNode and Google Data sStudio has become more timely and all my processing non manual.

TerryMud · April 18, 2019, 7:21am

Thanks Yanokwa., sometimes you have to be in my head I build my databases so that the odk_prod schema and the public schema where I store the processed data from odk are on the same db aggregate side means the odk_prod schema. It was the url for each submission that I wanted to form through sql I managed to work it out yesterday very much using the approach that ggalmazor recommends below. I wanted to (and have now achieved) remove any manual processes in processing the data from odk_prod. this was the last barrier and now all is working as I wanted and I can process submissions from the field in real time.