Send submissions via SMS

Sometimes enumerators don't always have a cell data connection but do have SMS. It'd be nice if we could send some data to an SMS endpoint. SMS is an unreliable protocol and there is still no good way to get SMSes into ODK Aggregate where the rest of the data lives, but I don't think we should let perfect be the enemy of good...

There is existing code in JavaRosa for an SMS transmission API. There is also an existing issue on Collect that summarizes the great work that Medic Mobile has done on their fork of Collect.

I've tried to summarize the above in the proposal below.

Proposal

A form designer specifies an SMS prefix (essentially a form ID), delimiter (defaults to space) and SMS number in the XLSForm. The designer also optionally specifies a short name (also known as a tag) for each question whose answer will be sent via SMS. This short name would be a new column in the XLSForm.

For example, the designer could have a form (prefix: F123, number: 555-5555) that asks for name (tag: +N), gender (tag: +G), and age (tag: +A). Enumerators fill this out like any other form in ODK Collect.

When the enumerator finalizes the form, Collect splits the finalized form into chunks of 160 characters. When the enumerator hits send, Collect sends every chunk over SMS to 555-5555. Once Collect is sure all the SMSs are sent, it marks the form as sent via SMS.

The submissions will look like this.
F123 +N JOHN DOE +G M +A 35
F123 +N JANE SMITH +G F +A 26

Note that the data will not go to Aggregate! Instead, they will go to an SMS endpoint (perhaps set up via TextIt) to receive that data. And from there, users can do whatever they want with that data (perhaps send it to Google Sheets via Zapier).

Administrators will be able to override the SMS number in the form by entering a different SMS number in the settings. And like all administrator settings, they'll be able to hide it via an admin password.

Open questions
My expectation is that if there was an SMS number in the form, that would be the only transport option. But I can also imagine situations where you would have an SMS number in the form, but would want the cell network to be the default and the SMS is backup. Or situations where you want the enumerator to choose the transport at the time of submission. How should we handle that?

The issue of lack of data connectivity and identifying options is a constant battle in most of the field work I have done. ODK has always been preferred and the phones set to sync all finalized forms when a wifi network is connected (i.e. work offline the whole day and sync when back in range) - that is how we have solved the issue in the past. I think sms would be a great addition and is worth testing.

I have 1 supplementation to your proposal:
We leave the admin to specify the default number and size of the SMS chunks in the same admin area where URL is defined. This is because not all areas/carriers have the same max character length; for example with my carrier it is 120 characters.

For the open questions:

In my opinion Collect should not be limited to one transport option. My first thought on how to integrate this feature is: when defining 2 languages (using xls) we use columns that define what Collect should display based on the language selection by the user. Why not include the sms feature in the same way? Define a new column for sms tagged variables as well so that when building a form input yes in that column at the end of the form if a user chooses to send information by sms only the defined variables with 'yes' in the sms column are sent; similarly if they choose to send by both sms and network the form is sent to both.
I suggest this because I have been in a situation where (due to poor data networks) we were using an SMS API for real time data reporting (few variables) and Collect for deeper analysis (many variables) but some of the same variables were being collected between the 2 methods - I always wished there was a better way to integrate them because data cleaning was a nightmare.
As for using SMS for backup; I guess this depends on how the portal receiving the messages is setup - from experience messages sent by sms are deleted after a period (determined by the API provider) so I would not suggest that they be used for long time back up. Furthermore, I noticed with sms the messages come in as strings and must be split during data cleaning or can be cleaned when echoed to a database, but it has been such a messy process in the past. Would be glad to hear if anybody else has had experience with that.

All in all I think this is an excellent feature to add.

7 Likes

I'm really keen on having this feature included in ODK. I'm involved in a study where I am collecting data using ODK over an extended period of time as part of a monitoring programme. Mobile connectivity varies a lot across the geographical areas where we're working so in some instances it's not possible to upload data for a number of days.

The outlined proposal looks like a good way to solve this problem. One suggested amendment would be that the option of amending the SMS phone number would be included in Collect (perhaps in the Admin settings). When using a local SIM, it's often the case that the phone number isn't known too far in advance, or faults in the SIM may arise resulting in the number changing mid-activity.

In terms of the open questions, I would like the enumerator to have the option of selecting the transport option, but for this option to be hidden using the Admin settings if not required. I also like the suggestion that there is the option to send only a subset of the form via SMS, although would this cause an issue with duplicate entries if the enumerator later chooses to resend the data using the cell network?

A few additional considerations that I have:

  • When an enumerator sends data via SMS will they receive any assurances that the data have been received by the number they send it to? Perhaps this is something to do within Zapier/IFTTT i.e. the phone receiving the SMS will automatically send a reply SMS to the enumerator's smartphone

  • If an enumerator tries to send the data by the cell network and it fails, if the SMS option is enabled could/should they receive a prompt to try using SMS?

Thanks Yaw for starting this discussion! I'm really excited to see this feature in action.

3 Likes

Wow, this is really amazing feedback. THANK YOU!

I've got some quick questions for you, @Leangelindiku17. What carrier did you use that had a 120 character limit? What phone or application did you use to send that message? Was it a plain text SMS or did it have special characters (e.g., accents, emoji)?

As to the feedback on the specification...

[*] I agree that form designers should be able to specify which data goes over SMS. In the spec, when the designer specifies a short name, I was imagining that will be in a new column and only questions that have that column filled out will be sent over SMS. I have updated the spec accordingly.

[*] I agree that we should have a way to override the SMS number in the settings and I will update the spec accordingly. And like all settings, you'll be able to hide it via an admin password.

I agree that we should have multiple transport options. That may result in duplicate data, but I think that's a good tradeoff because there is no easy way to guarantee that a message has been received with SMS.

Let's take @Shell's suggestion as an example. An enumerator sends the data to the server and it arrives. The server sends a reply to the enumerator, but that message isn't received or it's delayed for a few hours. What happens next? Does the enumerator keep sending until they get a reply? And what if submission the enumerator is sending spans two SMSes and only the second half arrives at the server? Does the server request just the first one? And what if that request isn't received?

Long story short, SMS is not a great foundation for data transport. We can be sure that messages are sent, but we can't be sure they are received. And that means that with SMS we risk data loss. Given that, we probably should have a way to send data via a more reliable transport even if we could generate duplicate records.

Further, in this spec, the data would be going to separate places so you could do things like do your real-time reporting with SMS on TextIt and then at the end of the campaign do a deeper analysis with your Cell data from Aggregate. So I'm imagining you are generally looking at the SMS data or the Cell data. And if you really needed to look at both, you could add a pseudo unique ID (e.g., name of head of household) to both submissions to make it easier to de-duplicate. Of course, this would make your SMS bigger. And that SMS that has the ID could be the one that was lost. You see where this is going...

One reasonable way forward might be to have a default transport setting on the device (Cell/WiFi, SMS, Both). This could be hidden via the admin password. And a form designer can override that default setting within the form's design. How does this sound?

@yanokwa, this was all plain text. I noticed when using a shared short code (Running on MTN Uganda) we found if an SMS exceeded 120 characters it would be split into 2 separate messages. Our dedicated short codes on Safaricom Kenya and Airtel Malawi did not have a similar issue. I then just restricted the character limit to 120 for all messages for uniformity and provided a template to be used when sending from our office MIS (bespoke) to user phones that kept it at 120.
All users used Android devices with no less than 2GB RAM running KitKat or better.
We were using an API provided by Infobip: https://dev.infobip.com/getting-started although I doubt it was the cause since we used it across the board.

From this: Long story short, SMS is not a great foundation for data transport. We can be sure that messages are sent, but we can't be sure they are received. And that means that with SMS we risk data loss. Given that, we probably should have a way to send data via a more reliable transport even if we could generate duplicate records.
My reply is this: I would imagine there will be a platform for collecting all traffic information. Which would include the sender, receiver and status of the message sent (Delivered, Pending, Not Delivered); this would be the log of activity and would not necessarily provide a 'message status' to sender or receiver. It would however provide a starting point for data cleaners to sort duplicate entries.

Having as many settings that can be altered an changed only after entering admin password sounds good to me, it is a small measure towards data protection if we can limit as much as possible human error at time of form submission.

1 Like

I would argue that Sending forms via SMS has lots of applications, although it may be very expensive in some areas.

There are many remote areas without WiFi signal coverage and only can send a message if people want to submit their forms.And we all know that ODK Collect aims to collect data in some special fields such as wildlife surveys or diseases investigation in some remote areas. So the SMS feature will be very useful in these cases.

Besides, I think ODK Collect mainly contained two parts:

  • Data collection

  • Data transmission

Data collection works basically without network in local devices.Data transmission was related to both local devices and remote servers.Transferring data via SMS or Bluetooth, NFC and other approaches they all have advantages and disadvantages ,so we can categorize them into probably two kinds:

  1. Communicate in short distance ( NFC and Bluetooth and WiFi-P2P)

  2. Communicate in whatever distance ( WiFi and SMS )

So we can create an option-menu for users to choose ,they can decide which approach may work better in this case.The option-menu contains two options ( short-distance mode or long-distance mode) ,and if there exists WiFi signal, we can use WiFi-P2P in short-distance mode or WiFi to server in long-distance mode automatically. If no WiFi available,they can use SMS if they need a remote commit.Anyway, those details can be discussed later.

But we have to make sure our APP can deliver these data to individuals and servers successfully.
What's your opinion about that ?

We created our own Collect branch primarily to send forms over SMS, so I am glad to see discussion going on about mainstreaming this feature. The way you've proposed it makes good sense. Here are just a couple of specific thoughts/questions:

  • Collect splits the finalized form into chunks of 160 characters. [...] Once Collect is sure all the SMSs are sent, it marks the form as sent via SMS.

    Might be too low-level of a question for now, but are you thinking of using the Android API method of dividing a message into parts and sending as a multipart SMS, or re-implementing this and checking that each part was received individually?

  • [...] situations where you want the enumerator to choose the transport at the time of submission. How should we handle that?

    I just clarified this in the Collect issue (URL was missing in the original post)... we chose to have both options shown to enumerators with side by side buttons: Send via Web and Send via SMS. The SMS button only showed if there was a phone number listed in the settings, and left it to the enumerator (not form designer) to decide what method would work best when it was time to send.

  • Would you consider having Collect checking for a response SMS from whatever SMS server you submitted the report? The response could be a human readable response, since it will show in the SMS inbox, along with a short confirmation code to be read by Collect tying it back to the submission. For example, you could receive "Your submission was well received [F123_97628]", using the form code and a partial hash.

This all sounds great.

Regarding the selection of the method of sending the data, I think that @yabbyad 's suggestion of enabling the enumerator to choose the method in which the message is sent to be a useful one. Perhaps within the device there should be a default setting as @yanokwa suggests, plus the option of enabling the enumerator to override this default setting when submitting their data?

Once the data has been sent by SMS, how should the entry then appear in ODK? I understand the issues of duplicate entries (plus the extra characters a unique identifier uses in the SMS), but perhaps the data should remain in the Send Finalized Form area until it is sent via the mobile network to account for the SMS route being the least reliable?

In terms of acknowledging the receipt of the SMS, I think having an automated "We received your SMS, ID xxxx" which can be read by the enumerator should be enough, rather than incorporating the receipt of the SMS within Collect. I'm not sure how Collect should treat the form once is has been sent by SMS however. Should it still allow the enumerator to send that form by SMS again (potentially resulting in duplicate entries) or should the option be disabled once it has been sent once?

1 Like

@abbyad, I took a quick glance at https://developer.android.com/reference/android/telephony/SmsManager.html and it looks like we can split, send the message, and wait for the message sent intent to fire. I propose we wait for 5 minutes for the sending (inspired by Android’s messaging app https://android.googlesource.com/platform/packages/apps/Messaging/+/master/src/com/android/messaging/util/BugleGservicesKeys.java#69) and say the sending has failed.

I think checking for a response SMS from the server is a bit too much to add at this stage. To use that data, the end point, which we don't control, has to send something back that Collect can parse. Let’s do without at first and it’s something that can be added if there’s demand for it.

As far as open questions, I have some proposed answers...

How we support sending the same form over multiple transports
I propose we don’t for this first pass because it makes for a messy implementation.

Instead, I propose we add a new Server type for SMS only campaigns. You can set a device specific number in this setting which can be overridden in your form (same as Aggregate server URL). Also we can add a “Get Forms From” option (choose from Aggregate or Google or other) for folks to fetch their form. Note that this Server type will only allow submissions of forms that specify an SMS prefix.

With this proposal, an enumerator can switch between sending over SMS and data by changing the server type in the settings (if that isn’t locked down by an administrator). This is less easy than in Medic Mobile’s implementation where both send options are available on the send screen. I think that’s OK because with Medic Mobile, they control the endpoint and so letting an enumerator choose a transport seems fine because the data ends up in the same place. In our case, the data doesn’t really end up in the same place, so it seems like it’s not a decision you want to push to your enumerator.

If someone wants to send one or more forms over data after they send over SMS, then that’s still possible. On the device, you can change the Server type, go into Sent Finalized Forms, Change View to show sent forms, and send all the data via WiFi/Cell.

Would that work for you, @Shell?

How do we show successful transmission to users?
Same way as we do with WiFi/Cell. If we get a successful delivery message, we show it as sent.

I love the proposed implementation. What if the submission was compressed instead of/in addition to being formatted as described earlier? @yanokwa's proposed format is already pretty slim, but especially with long forms, free text, and repeating responses compression can save you a lot of texts that need to be sent.

This would just mean that texts have to be decompressed again, but some processing is needed anyway to turn F123 +N JOHN DOE +G M +A 35 into tabular format again.

2 Likes

Multipart SMS

https://developer.android.com/reference/android/telephony/SmsManager.html and it looks like we can split, send the message, and wait for the message sent intent to fire. I propose we wait for 5 minutes for the sending (inspired by Android’s messaging app https://android.googlesource.com/platform/packages/apps/Messaging/+/master/src/com/android/messaging/util/BugleGservicesKeys.java#69) and say the sending has failed.

Seems like a sensible approach!

Confirmation SMS

I think checking for a response SMS from the server is a bit too much to add at this stage. To use that data, the end point, which we don't control, has to send something back that Collect can parse. Let’s do without at first and it’s something that can be added if there’s demand for it.

Agreed. This can be added later as needed.

Multiple Transports

[...] an enumerator can switch between sending over SMS and data by changing the server type in the settings (if that isn’t locked down by an administrator).

I understand why offering a single transport method is a simpler implementation, but surprised that we'd be the only ones routing the submissions to the same place. I would have thought that others using this would eventually route the messages to the same place, even if it has to go via another pathway (eg Zapier, IFTTT, custom gateway, or manual import). In these cases it would seem beneficial to offer a fallback transport if the primary (eg mobile data) is not available.

@yanokwa what do you think of offering in the settings an "Alternate Server", which could be a different type. If an alternate server is set then the enumerator could chose to send with the primary or secondary pathway, or if you want to get fancy only offer the option to send via secondary transport if the primary fails.

Would this be useful for others interested in this feature too?

2 Likes

I like the proposal so far. I think we should always include a unique identifier for a single submission in all SMS responses similar to the instanceID uuid() could be shorter or compressed which will be available in both the SMS and HTTP transmission of the same submission. Relying on user data like family name may not always be ideal, and the unique identifier allows more natural meshing of the data collected in the different transport mechanisms, less magic required on server side implementations.

3 Likes

This is tricky because a true UUID is 36 characters and we can't send 36 characters because that would use up the 140 characters.

Base 62 (0-9,A-Z,a-z) is used by bit.ly and tinyurl because they are unique and human readable. If we assume case-sensitivity, 5 characters would give us 62^5 which would give us ~ 1 billion IDs.

If we assume that the endpoint will know the phone number of device that sent the message, then we only need to uniquely submissions from a single device and a billion IDs seems enough. If we can't assume that then we can go up to 6 characters and get ~ 60 billion IDs.

What say you @Ukang_a_Dickson?

That will be good enough for me.

This is getting a bit technical, but for the ID, it might be wise for us to include this as a custom XPath function (base62) in the XForms spec and implement it in JavaRosa? So the XML block would look like this:

<smsblock xmlns="http://openrosa.org/smsdata" prefix="F123">
    <name tag="+N"/>
    <gender tag="+G"/>
    <age tag="+A"/>
    <meta>
    	<instanceID/>
    </meta>
</smsblock>
...
<bind calculate="concat('base62:', base62())" nodeset="/smsblock/meta/instanceID" readonly="true()" type="string"/>

@LN Does this like something that'd be appropiate for the spec?

1 Like

@Tino_Kreutzer as for your proposal, I think we can use a compression algorithm to compress form data before sending. My suggestion is to use a Huffman code in this case.

Hey @yanokwa loving the proposal so far! Has this spec been finalized? I would love to give this a go!

@Mickys0918 I think compression might be the kind of thing that we take on after this initial implementation. It's work on the back end to decode which means that it's even less turnkey for users.

@Tino_Kreutzer Yes, we do still have to process the texts, but they are still human readable and easier to process than what you'd need to do with true compression.

You are killing me, @abbyad! Or rather, thank you for such great feedback. :laughing:

What about for each server type (Aggregate/Google), we add a dropdown that allows users to choose the transport. So for Aggregate, the settings would look like this...

ODK Aggregate settings

  • URL
  • Username
  • Password

SMS settings

  • Phone number

Transport settings

  • GPRS/WiFi | SMS | Both

If Both is set, then when you hit Send Selected in Send Finalized Forms it pops up a dialog that lets you choose the transport mechanism.

Hi @joeldean! I think the TSC will discuss and finalize on Wednesday, but I think there is enough here to start thinking through an initial implementation and seeing where the gaps in the spec are. Are there any gaps that you can see so far?