Collecting biometric data is pretty cool and has lots of potential for applications in ODK. One of those is to link data across time.
This is a brief presentation of work done by our team at LSHTM with @seadowg, @dr_michaelmarks and others, to make a system for capturing fingerprints during ODK data collection with ODK Collect.
The primary motivation here was to be able to do successive waves of data collection in the field and then later (at the desk) to check (using fingerprint data) that data which are supposed to come from the same individual across time match up at the biometric level.
Although it could have real potential in the world of the entity based data collection future of ODK, we definitely don't want to do fingerprint matching in the field (i.e. getting ODK collect to pull up someone's data) at this stage. Not least, this is because there's many security issues that would come with this, but also because entity based data collection needs to precede such development.
Code base
Code for this open source project can be found here https://github.com/LSHTM-ORK/ODK_Biometrics
System design of the Biometrics Framework
The novel biometrics system consists of two components. The first component is “Keppel”, a smartphone app designed to run on Google Android operating systems. This app provides an I/O interface between the ODK Collect app and an ANSI INCITS 378-2004 compliant electronic fingerprint reader/sensor device. The app has to be sideloaded (it isn't on play store yet).
A really important point here is that the system is not simply taking photographs of fingerprints. The data are stored as concise code which has a very 'lite' impact on the size of the data stored in ODK and also requires no use of attachments. The fingerprint data are captured as plain text that is stored and encrypted along with other ODK data.
The Keppel Smartphone app was designed using Android Studio and Software Development Kit (SDK) https://developer.android.com/studio. The initial version of the app works only with the low cost (<£50) Mantra MFS100 Biometric C-Type Fingerprint Scanner (Mantra Softech Inc. www.mantratec.com), functionality for which was based on code templates provided within the Mantra MFS100 Software Development Kit (https://download.mantratecapp.com/).
The app was designed with a view to making the addition of further biometric sensors relatively simple. A software ‘demo’ scanner is also included, and this allows users to test their fingerprint supported ODK forms without having a scanner connected.
Using Keppel App to capture fingerprint templates
The app integrates with ODK Collect's External app widget using the uk.ac.lshtm.keppel.android.SCAN
intent. An example XML form can be found here and an XLS Form version can be found here.
To capture all the fingers of one hand, your form would look like this.
and on the screen
Clicking 'launch' opens the external app
Pressing 'capture' then activates the scanner.
and once the template has been captured, the data are returned to ODK Collect as plain text
N.B. Here I'm using the dummy scanner
This whole process is pretty quick. Each scan takes just a couple of seconds.
Matching fingerprint templates
The second component of the system is the Keppel Command Line Interface (CLI), a Java/kotlin application designed to run on the command line of a desktop or laptop computer. The Java application is able to compare any two ANSI INCITS 378-2004 fingerprint templates and to generate a simple score which describes the overall similarity between the two templates. The Keppel Java CLI was based on code provided in the Mantra MFS100 Software Development Kit. Calls to the CLI take the form
keppel match -p [template1] [template2]
where [template 1] and [template 2] are either plain text (flag -p) or standalone (no flag) files containing copies of the fingerprint templates of interest.
To use this part of the system, you'd download the CSV file from ODK Central, extract the data from the columns relating to the fingerprinting and run the CLI once for each pair of templates.
Comparing two templates takes around one second of compute time and, for purposes of scaling, the CLI call can be handled by other software tools such as R, Python and C++ as an embarrassingly parallel workload.
The core function requires that each template is stored in a single line of its own text file.
From version 0.3, the following options are available
-p
Treats TEMPLATE_ONE and TEMPLATE_TWO as plain text rather than file This option is very useful for scripted analysis from R or python
Example [templates truncated]
keppel match -p 464d520020323000000001080000013c016200c500c5... 464d520020323000000000f00000013c016200c500c...
-ms
Return whether templates match along with score like "match_210.124"
-m
Return whether templates match (either "match" or "mismatch")
-t FLOAT
Threshold (score) to be used to determine whether templates are a match or mismatch
-h, --help Show this message and exit`
Real world tests
We're currently in process of doing a formal evaluation of how well this system works for linking data from different time points across a longitudinal survey. Hopefully we'll be publishing this in the next few months, but here's a sneak peak of the results.
In this study we asked 200 people to scan each finger of their right hand twice. This allows us to compare the first and second scan to see how the system performs. It also allows us to test fingers from person A to fingers from person B. Overall that makes it possible for us to investigate how often we'll see false positive matches (in the mismatched pairs) and false negative results (in the matched pairs).
Key finding 1 is that the quality of the match goes down as you move along the hand. Matched pairs of scans from the thumb perform best, whilst the ones from the pinkie are the worst. In short, if you scan any finger, it may be best to choose the thumb.
Having said that, they're all pretty good and there's fairly good separation between the distribution of scores in the matched and mismatched pairs of scans (see chart below).
This looks great, but false positives and negatives are happening here (see how the distributions overlap a little at the black dots [outliers] at the low end of the matched group and the top end of the mismatched group). This could cause problems even if the rate at which those occur is pretty low. In a study of 10000 people, a false negative rate of 1% adds up to 100 cases where you'd get a false negative result.
Key finding 2 is that you get a much lower false negative / false positive rate if you combine the scores for multiple fingers. In the chart below we see that when we scan the thumb, index and middle finger, then add up the scores, we can get a much better result. Here we called positives anything with a combined score above 75. In this small study, combining the scores from three fingers gave us false negative and false positive rates that were zero. No system is really perfect, so there's still going to be a few problems, but the conclusion is that if you capture three fingerprints and combine scores, you can get a very good diagnostic on whether any two data records collected with ODK collect actually come from the same individual.
Future directions
We're keen to expand the range of devices that this system works with. In theory it should be fairly easy to add new fingerprint scanners and the framework should allow for things like iris scanners to be added, though we'd need new functions to the CLI to add different types of biometrics. The fingerprint templates are an ANSI/ISO standard, so many reader devices will spit out data that are compatible with the existing fingerprint CLI.
I think that there's also scope to add in bluetooth connectivity and functions for reader devices for things like RFID / PIT chips (@Florian_May!)
As an open source project we are of course very keen for others to get involved.
Funding & Ethics
This work was funded by the UK Department of Health and Social Care using UK Aid funding managed by the NIHR (PR-OD-1017-20001). Ethical permission for the study was granted by the London School of Hygiene & Tropical Medicine Observational Research Ethics Committee (Ref. 22562).