Data cleaning using STATA

1. What is the problem? Be very detailed.
I would like to know any resources/tips on cleaning data after downloading it in csv format from ODK Aggregate. Can I use excel to clean data on STATA? How? Anything useful for a STATA user?
2. What app or server are you using and on what device and operating system? Include version numbers.
ODK Aggregate v1.7.3
3. What you have you tried to fix the problem?
Looked into some online resources but nothing is clear or specific for csv data
4. What steps can we take to reproduce the problem?

5. Anything else we should know or have? If you have a test form or screenshots or logs, attach below.

No, thank you!

odkmeta (https://ideas.repec.org/c/boc/bocode/s457767.html) is a STATA module that imports and labels your csvs, which is very handy. Doesn't particularly clean your data -- you would have to write code specific to your data for validation rules, cleaning, etc., since that cleaning would be specific to your dataset.

2 Likes

odkmeta imports ODK CSV data to Stata, completing several tasks that should make working in Stata easier, including labeling variables and values, formatting date and time variables, and merging repeat groups. Note that odkmeta works with data exported from ODK Briefcase, which you can connect to your Aggregate server. However, I'm not sure it works with data exported directly from within Aggregate.

2 Likes

I am the author of odk2stata (https://github.com/PMA-2020/odk2stata). This is a tool I created recently to help our team generate some simple Stata do files for our ODK forms. The list of data cleaning tasks is displayed at the github page. We needed more flexibility and ease of use than what odkmeta provides, hence this project. So far our experience has been positive! Give it a try and let me know if you like it and how it could be improved!

2 Likes