I have a project which has gathered a massive number of form submissions, around 100k+. Each form has 4 or more images as well. On hard disk, it has taken over 300GB of data, and increasing. The problem now is that this huge amount of data is going beyond the max limits of a lot of resources. I have been increasing server CPU power and memory consistently (on cloud), but it has reached to a point where now tomcat is giving up.
I need advice on following challenges:
- How to manage such huge amount of data, going forward? Even when I open aggregate, it takes eternity to open, due to image loading.
- Can I offload the saved data somewhere to make space for new incoming data, and make the server lighter? I usually take backup via briefcase. But then again, briefcase only helps me take a backup, and i still need to put the data somewhere to extract it properly (make CSV, visualize on map). Maybe a separate standalone offline aggregate?
- Extending point 2 above, is it possible to push the data from briefcase to another aggregate server, which does not have the URL of main project (since primary URL goes to the main server of course!)
- Is there any load-balancing mechanism I can put in? At which level (aggregate, tomcat, etc.)? I am using AWS cloud.
And the final question: Can ODK central help in any such scenario or massive data handling? I have been delaying learning ODK central for some time now, but if it is the way forward, then now is the time for me to do it.