Google Sheet Publisher Issue - Somehow DB entities for publisher got into problem state

What is the problem? Please be detailed.

I'm a water resources engineer using ODK Collect and Aggregate for a citizen science project in Nepal called SmartPhones4Water. Our Aggregate URL is www.s4w-nepal.appspot.com

Roughly 1 week ago I noticed that our Google Sheet for our form S4W-Nepal (id: S4W-Nepal_v1.01) wasn't being updated with new data. When a signed into Aggregate and navigated to Form Management > Published Data and selected S4W-Nepal, there was a red warning next to the publisher in question. I restarted the publisher but it continued to go to a "paused", then something like "trying" and so on like it was in some sort of infinite loop.

Then I tried to delete the publisher and create a new publisher. The new publisher published the first 732 rows and has stopped completely (there are a total of 9000 records to be published).

Now, when I select S4W-Nepal from Form Management > Published Data, I get an error message that says:

s4w-nepal appspot com says:

Error: Problem persisting data or accessing data (Somehow DB entities for publisher got into problem state)

After spending some time on Google Cloud Platform looking at error logs, it appear that the issue is with UploadSubmissionsWorkerImpl.java:135: org.opendatakit.aggregate.task.UploadSubmissionsWorkerImpl uploadAllSubmissions: Persistence layer problem: Somehow DB entities for publisher got into problem state (UploadSubmissionsWorkerImpl.java:135)

What ODK tool and version are you using? And on what device and operating system version?

ODK Aggregate v1.4.15 running on the Google Cloud.

What steps can we take to reproduce the problem?

I can provide login credentials so you can see the cloud console and/or ODK Aggregate so you can see the problem first hand.

What you have you tried to fix the problem?

I tried to delete the problematic publisher and create a new one. I'm thinking about redeploying ODK Aggregate as a last stitch effort, but I'm sure you guys will have a better idea than that!

Anything else we should know or have? If you have a test form or screenshots or logs, attach here.

Click here to view a link to the log viewer.

Just in case the link to the log viewer doesn't work, here is a screenshot of what I think is the problem.

Just as a quick update, since the app seems to be in a loop our datastore read costs are piling up. I just had to pay $100 dollars in fees, so any help would be greatly appreciated! My only solution to stop the charges at this point would be to delete the form, and since we have 9000+ submission, I don't want to lose all that data. Please let me know if I need to add any details to get a response. Thanks so much for your time and I look forward to collaborating with this group in the future!

Hi @Jeff_Davids. I'm sorry that you are having problems! Can you please add yanokwa@nafundi.com as an owner on the Google App Engine account. Once I have access, I can take a quick look and see what is going wrong.

Hello @yanokwa. Thanks so much for being willing to look at this! I just added you as an owner on the Google App Engine account. I look forward to seeing what the issue is. Best.

A post was split to a new topic: Aggregate publisher problem

@Jeff_Davids, I'd recommend to back up your data using Briefcase as soon as possible. That way, you could even remove the form and submit it again, and then push the submissions back to the server

@yanokwa could you search for a full stacktrace on the logs? Also, if @Jeff_Davids can point you to a submission that isn't on the google sheets, could you locate it on the Data Store and see if there's any structural problem with it? (comparing to a submission that got to the google sheets)

So, I've traced the exception in the code to these lines in FormServiceCursor.constructExternalService(FormServiceCursor fsc, IForm form, CallingContext cc):

try {
  // ... some stuff
  return new GoogleSpreadsheet(fsc, form, cc);
  // ... more stuff
} catch (Exception e) {
  throw new ODKEntityNotFoundException("Somehow DB entities for publisher got into problem state", e);
}

Unfortunately, we can't pinpoint the source of this exception at this moment because upstream try blocks lose the cause of this exception.

Digging a little deeper, I can identify two broad possible causes for the exception:

  1. Problems with Google Sheet API
    This could be due to many things like exceeding quotas, network problems, expiration of credentials... you name it. It would be helpful to know if you have experienced any problem at all while working with Google tools lately.

    Worst case scenario, it could be due to irregularities in Google's service. I don't have specific prior experience with Google Cloud services but I've dealt with this kind of situations on Amazon and other cloud services and normally this is solved within 24h...

  2. Problems with data in the Data Store
    This could be due to exceeding quotas as well, but you mentioned that you've activated the billing plan. Worst case scenario, we can detect anomalies in the data itself that could justify an error.

    A quick test for this could be to pull submissions using Briefcase. If this works it's a sign that there's no problem in the Data Store

Hi @ggalmazor thanks so much for working on this! I'm pulling the intances right now to my computer using Briefcase so at least we will have a backup. Right now I'm at 40 or 9000+ so it might be a while... So far so good, but I'll let you know how it finishes.

I have several other forms that are properly publishing to both Fusion Tables and Google Sheets, so that makes me think the problem isn't with Google Sheets API, but I definitely will defer to you guys on this one! In fact, I just looked, and the Google Fusion Table publisher for the same form that we are getting the error on (S4W_Nepal_v1.01) is still being updated successfully.

Sounds like worse comes to worse, I'll have to (1) finish the Briefcase backup, (2) delete the form, (3) reload the form, (4) push the submissions from Briefcase, and (5) restart new publishers? Does that sound about right?

@ggalmazor and @yanokwa I'll defer to your judgement with regards to when it makes sense to try those steps vs. doing additional troubleshooting.

@Jeff_Davids Briefcase has an experimental pull option in the settings that can drastically speed up pulls. I'd encourage you to use that setting. If you don't see in it settings, update to the latest version of Briefcase: ODK Briefcase v1.9.

I agree with @ggalmazor that this problem is either an issue with the data store or the API. I've looked at the data store and it looks fine to me, so it's very likely the API (or how we handle intermittent errors in the API).

I don't think you'll have to do anything as intensive as delete all the data and re-upload. The back up is just so we know the data is safe on multiple servers. I'm optimistic that this is an issue that can be fixed without touching the data. We'll know more as we keep troubleshooting...

1 Like

Hi @Jeff_Davids, @yanokwa!

I wanted to write my insights after taking care of this problem for some days.

  • We found that a duplicated image was causing a crash on the service that makes the uploads for the publishers
  • There was a glitch on GAE/DataStore that would prevent me to find the duplicated rows. @yanokwa reports that this has happened to him as well with large datasets ¯\_(ツ)_/¯
  • We created a pair or publishers so that:
    1. one would publish only new submissions
    2. the other would publish old and new submissions.
  • We verified that the second publisher wouldn't go further than 700ish rows. We checked that we were getting crash reports on the logs whenever the publisher would retry uploading submissions.
  • We removed the oldest duplicated row
  • We verified that the second publisher now would continue its work until all submissions (8000ish) were uploaded to the Google Spreadsheet.
  • We also detected some TaskLock error messages but checked that those wouldn't prevent the publishers to complete their tasks

Insights:

@Jeff_Davids, could you confirm that everything goes well now?

Also, we can transfer you the ownership of the Google Spreadsheets we have created. That way you will save yourself the effort of repeating the process (it took more than 24hours).

1 Like

Hi @ggalmazor!

Thanks for all your work on this. Indeed, your summary and insights looks spot on.

I just took a look at the publishers and everything seems to still be working. If you could transfer ownership of the Google Sheet that has the "BOTH Upload Existing & Stream New Submission Data" action to s4w.nepal@gmail.com that would be great. Let me know if you need me to do anything for this process.

Hi @Jeff_Davids!

I transferred the ownership of the spreadsheet some days ago, could you confirm?

Hi @ggalmazor ! Yes, the spreadsheet is now available in our s4w.nepal@gmail.com account, but in ODK Aggregate it is still showing you as the owner. This isn't a big deal, but I wanted to see if this is how it will remain:

image

Hmmmm, I'll take a look. Maybe we can simply edit that information on DataStore to update the email address that's shown. It feels like that email there is just for show once the publisher has created the Google Spreadsheet...

OK, I have updated the field and now it shows your email.

1 Like

Sounds good. If it is just a cosmetic thing then it isn’t a big deal really.

1 Like

We are running into the same issue which explains the high charges in our other ticket.

The crash is happening so often that I can't even open the published forms pane for the particular form on the ODK Aggregate server.

I believe the part of your solution that fixes this is to 'remove the oldest duplicated row'. Is that all that is needed for the fix? This will need to be done in the GCP DataStore.

Also, when we publish data in the future, what can we do to prevent the issue from happening again?

Thanks!
Erik

Hi, @esurface! Yes, we can only provide this workaround so far. Unfortunately, we still haven't been able to determine how and why these duplicates are created, although it looks like AppEngine deployments using big forms with many repeat groups tend to be the ones that suffer this the most.

Hello there. Sorry to hear esurface that you are having this same issue. Yes, my recollection was that you delete the oldest duplicate record and that took care of it.

Ultimately, we ended up moving to a JSON publisher and built our own custom webapp with Python and MongoDB to avoid these issues. I'm not sure what the long term solution is for you.

Good luck!

jeff