Add XPath substring-before() and substring-after() functions

What is the general goal of the feature?
Add ability to 'parse' elements out of a string based on a delimiter, eg "123.45,67.8"

What are some example use cases for this feature?
Presently you can only extract components from a string using the substr(string,start,end) function, which require knows the precise start (and end) location within the string. This is only workable if you know apriori the index of the delimiter, eg known fixed prefix like "From:" where the delimiter is always going to be the 5th character. However, this is not the case in general, and there is no equivalent index(string,char) XPath function which could otherwise be used to determine the index position of the delimiter.

XPath 2 defines the substring-before() and substring-after() functions for this purpose (where a single character delimiter is just the simplest case). These are already supported by Enketo; they would need to be implemented for javaRosa (and libxml2).

[note, you can actually exploit the existing ODK selected-at() function to effectively parse an non-select string result based on a space delimiter. But this only works for space, and is a bit of a kludge in any case]

What can you contribute to making this feature a reality?
I can do C implementation for libxml2, and probably java one for javaRosa too.

I'm OK with this especially since you'll be writing the code!

@martijnr what has our stance been about selectively pulling in functions from XPath 2?

I am also very much in favor of this! Will be particularly useful for querying external data.

I think this would be from XPath 1.0, right? (i.e. no changes in those functions in XPath 2, right)

@martijnr what has our stance been about selectively pulling in functions from XPath 2?

We love it! Anything from XForms, XPath 1, 2, 3 is great if we can use it.

I think the XPath 1.0 definitions are probably sufficient to accomplish the desired result; specifically, no XPath 2.0 collations. [more so if I gotta implement it... :slight_smile: ]

From XPath1.0 Spec:

Function: string substring-before(string, string)

The substring-before function returns the substring of the first argument string that precedes the first occurrence of the second argument string in the first argument string, or the empty string if the first argument string does not contain the second argument string. For example, substring-before("1999/04/01","/") returns 1999.

Function: string substring-after(string, string)

The substring-after function returns the substring of the first argument string that follows the first occurrence of the second argument string in the first argument string, or the empty string if the first argument string does not contain the second argument string. For example, substring-after("1999/04/01","/") returns 04/01, and substring-after("1999/04/01","19") returns 99/04/01.

So, unless this raises @LN from the dead, is this a go? :grin:

2 Likes

So I just finished implementing both these functions today - for libxml2 - only to find during subsequent testing that my new XPath extension functions wouldn't register correctly. Why? Because libxml2 already implements them natively! Haha... Lesson #1: RTFM! :slight_smile:

I'll start in on the java version now for javaRosa, and will post a PR when ready.

1 Like

Finished code changes and unit test. PR opened.

Tested using this simple form:

substr.xls (5.5 KB)
substr.xml (2.2 KB)

against examples listed here for substring-before() and substring-after(), as well as few other manual ones. Note: Enketo already supports these XPath functions so - if you can avoid Validate (eg KoboToolbox) - you can already try out this form with the new functions under Enketo.

3 Likes

Closing out this feature request as the javaRosa PR has been merged (thanks @ggalmazor & @dcbriccetti). So these new XPath functions should become available in a subsequent ODK Collect (and Validate) build. In the mean time, again, they work presently under Enketo if you want to play around before then.

I'll submit a corresponding PR to update the docs accordingly.

3 Likes