New number-or-zero() XPath function


(Dr. Gareth S. Bestor) #1

What is the general goal of the feature?
New ODK XPath function for purpose of returning a valid number for arithmetic calculations, regardless of whether input is null (or even invalid), in which case it returns 0.

What are some example use cases for this feature?
Presently null inputs have to be converted to 0 - to avoid dreaded 'NaN' problem - using either coalesce(${x},0) or if(${x}!='',${x},0). Both are rather verbose and so subject to error, and involve evaluating XPath expressions containing multiple function arguments. Its also unclear, from a strictly XPath data type perspective, what datatype these should return. It would be more convenient, and somewhat faster, to have a specific XPath function for this common usecase, along the lines number-or-zero(${x}), which would perform the equivalent of: if(number(${x}) != NaN, number(${x}), 0)

What can you contribute to making this feature a reality?
I can implement it for libxml :slight_smile:


Convert input string values
(Dr. Gareth S. Bestor) #2

bump.

Any interest in this, as an (simpler) alternative for dealing with the not uncommon NaN problem in arithmetic calculations?


(Yaw Anokwa) #3

I don't have an opposition to the feature. Perhaps this can be your foray into contributing to JavaRosa and the spec?


(Dr. Gareth S. Bestor) #4

I'll just go dust off my "Java for Dummies"... :slight_smile:

[Aside, but as it happens, I only just this morning finished finally implementing if(cond,true,false), for libxml2... And in case yer wondering how in the heck I managed without if() till now, you can actually accomplish it with just the basic XPath1.0 math & string functions, although its gawd-awful messy! lol]


(Dr. Gareth S. Bestor) #5

(Yaw Anokwa) #6

@martijnr I think we should have had a spec discussion before we got to the PR stage, but we shouldn't let perfect be the enemy of good!

Would this number-or-zero function be acceptable to you as an addition to the spec? Any namespace considerations we should be thinking of?


(Dr. Gareth S. Bestor) #7

Actually, "number-or-zero" was @martijnr's idea - I had my heart set on "numberOr0", but I acquiesced to his better judgement [apparantly he doesnt like camels...] :laughing:


(Guillermo) #8

Sorry to be nitpicky, but adding a function for a specific application of coalesce sounds a little bit off.

  • coalesce is already a broadly used verb for this operation in the context of data. In any case, I think that it should be called coalesce-zero
  • I don't get how number-or-zero(${x}) is less verbose than coalesce(${x}, 0) (it's longer!).
  • If the context is arithmetic operations, why zero? Why not one instead? If you're going to multiply numbers, you need a one. Other aggregation operations in other contexts could require other neutral terms, which makes this one look weirdly specific.

If the problem we're trying to solve is handling null values while aggregating nodesets, we could change those sum(), max(), min(), etc. functions to gracefully handle null values with proper neutral values in the context of the operation we're trying to do e.g. Integer.MIN_VALUE in the context of max().


(Martijn van de Rijdt) #9

Thank for posting this. Sorry, I somehow overlooked it.

it's longer!

I think it may only make sense when it covers 'NaN' and any non-numbers which if(number(${x}) != NaN, number(${x}), 0) does and coalesce doesn't. Whether it is useful enough to deserve its own shortcut function, I don't feel qualified to comment on. Would be good to hear from users as well.

we could change those sum(), max(), min(),

I would not be in favor of changing those. We'd be changing native XPath functions. It can be very useful for sum() to return NaN until all nodes in a node-set have value.

he doesnt like camels

Indeed! :slight_smile:


(Dr. Gareth S. Bestor) #10

Not nitpicky at all - your comments are most welcome. Basically, the 'problem' this is specifically trying to solve is the quite common mistake newbie (XLSForm) writers make where they - somewhat naturally - assume (numeric) questions that are not answered will be zero when used in any subsequent calculations. Addressing this can certainly be accomplished with coalesce(${foo},0) [which in turn is arguably redundant since if(${foo}!='',${foo},...) can do anything coalesce() does...]. But for someone living in XLS-land - who is used to just dealing 'variables' like ${foo}, and has little or no concept of XML nodesets and XPath - the concept behind of coalesce() is pretty obscure...

coalesce(arg, arg)
Returns first non-empty value of the two arg s. Returns an empty string if both are empty or non-existent.

"huh?"

I think it'll be a bit more self-explanatory, for XLSForm writers, to just say if you want to include values in a calculation that might be unanswered/null, you should put number-or-zero(${foo}) instead of just ${foo}.

A minor technical advantage of number-or-zero(x) over coalesce(x,y) is that the later can require two XPath argument evaluations, whereas the former only ever requires one. Minor efficiency advantage, but when, say, a summation calculations involve lots of operands every one may need to be recalculated until the calculation quiesces.

If the concensus is that number-or-zero() is too specific for general use and unnecessarily redundant, then I'm happy to withdraw the proposal. However, I suspect the opposite may in fact be true - pretty much the only time I ever see coalesce() being used is for this exact purpose, and if a simpler alternative like number-or-zero() were available it would probably get used instead (and we may hardly see coalesce() popping up anymore...). Which I guess you could consider an argument in favor of introducing it.


(Dr. Gareth S. Bestor) #11

Agreed. We dont want to make existing forms now return different results, nor change the behavior of existing W3C XPath functions unless there is an extremely compelling reason to do so (round() was borderline IMHO... :slight_smile:


(Guillermo) #12

OK, that makes sense to me now :slight_smile:

I think what I still don't get is why a fixed zero as a default value. Are we only supporting sums?

Maybe overloading number() to admit a second optional argument would work:

  • It's shorter
  • We're not adding a new function, nor changing current behavior
  • We're not deciding a fixed value for our users, supporting other scenarios for free, like multiplication, or any other thing users want
  • number(a, b) calls number(a) which somehow makes more sense and cohesive.

(Dr. Gareth S. Bestor) #13

Interesting idea @ggalmazor, I hadnt thought of it that way!

Although involving more arguments (and therefore less efficient), I do certainly see an appeal to a number(${foo},x) function that attempts to perform a number(${foo}) but now allows you to specify a result instead of NaN if it cant (which will probably be 0 in 99% of cases). Its also better than coalesce(${foo},0) in its handling of NaN, plus the name "number" is less obscure to a newbies than "coalesce".

I'm agreeable to this change. @martijnr? @yanokwa?


(Yaw Anokwa) #14

There's something very elegant about number(${foo}, default) and we can use that idea elsewhere. I'm OK with it. @martijnr, is this OK for you?


(Dr. Gareth S. Bestor) #15

In my defense, 99.9% of the time NaNs seem to blow up calculations 'cause somebody assumed an unanswered question would be 0... :slight_smile:


(Martijn van de Rijdt) #16

I can see overriding number() with a second argument is the most attractive solution for users, so it's fine with me.

It's a little problematic for Enketo developers (and perhaps @Xiphware?) that try to leverage a native XPath evaluator as much as possible (for a ~100x performance improvement), but I'm sure those poor folks can figure out a way, right @alxndrsn?


(Dr. Gareth S. Bestor) #17

Good point! I'll need to see how (or if!? :fearful:) I can catch libxml2 errors, when the build-in number() function fails due to 'invalid' number of arguments, so that I can launch mine.


(Martijn van de Rijdt) #18

right, or perhaps a regex replace for 2 arg usage to if(number([1] != NaN, number([1], [2]) before sending to the evaluator


(Dr. Gareth S. Bestor) #19

Presumably Enketo's XPath support is already somehow handling these 'extensions' to existing Xpath functions, since you have to do similar to handle ODK's round(number, places), with extends the base XPath1 round(number) function to add an additional optional 2nd parameter?


(Dr. Gareth S. Bestor) #20

Ewww.... yuck! [I cant imagine the regex necessary to correctly deal with substituting different flavors of nested number() calls... :face_vomiting: ].

So, it looks like libxml2 will fail to register an XPath extension having the same name as an existing function (as opposed to gracefully failing over, which I was rather hoping for...), and no obvious way to intercept the resulting error (in order to insert my valid value) before the entire XPath expression evaluation aborts. :slightly_frowning_face: [So I'm not even sure I can implement support for ODK's round(number,places)! :sob: ]

I think the only way to accomplish this (in libxml2) would be to have the two functions in distinct namespaces. Which begs the question, is it actually strictly legitimate - wrt W3C XPath spec - to have different versions of the same XPath function name in the same namespace? I poked around the W3C specs a bit and didnt find anything explicitly indicating one way or the other; @martijnr do you happen to know?