Aggregate publisher problem

Hi @yanokwa, I am also having this same problem and found the same error message in my error logs on google console. I've been streaming to google drive with very few problems since April - this has just cropped up in the last two weeks with one of my new forms, and also one of the forms that has been working since April. I am not seeing any error message on the publisher page of ODK aggregate (bad credentials, or paused, or active-retry). It says the publisher is active. At the same time, I have other forms that are streaming with no problems. We are using Aggregate v1.4.13 on Google Cloud. Once you've been able to solve Jeff's issue, would you let me know what the solution is so I can fix mine? (Or I can also add you as on owner on my app engine, whichever you would prefer. I can also open a new thread if that is more appropriate.) Thank you very much!

@Christy_Marie I've split out the topic because it's not clear this is the same problem. Can you please answer the questions below?

What is the problem? Please be detailed.

What ODK tool and version are you using? And on what device and operating system version?

What steps can we take to reproduce the problem?

What you have you tried to fix the problem?

Anything else we should know or have? If you have a test form or screenshots or logs, attach here.

Hi Yaw,
Thank you for your offer of help!

What is the problem? Please be detailed.
Aggregate is no longer streaming properly to Google Drive. I've been streaming for several of my forms since April with few issues. In the last two weeks, two are not working properly: one form that is new (but was initially working) and one that has been streaming without problems for months now. When I view the publisher page, the publishers say they are active, but there is data on the server that is not on my googlesheets. Yesterday, the only error I had in the previous 24 hours was "Persistence layer problem: Somehow DB entities for publisher got into problem state (UploadSubmissionsWorkerImpl.java:135)." (And when I googled that, I ended up on the last thread.) I looked for both error and critical error logs on GE application, background, and default service. However, today, there are a slew of out of memory errors. I also saw this note in one of the errors "The API call user.GetOAuthUser() took too long to respond and was cancelled" which I think is how aggregate talks to google drive? Yesterday the one error that ended with the persistence layer message now has more logs and ends with the out of memory error message (as of now).

What ODK tool and version are you using? And on what device and operating system version?
We are using aggregate 1.4.13, hosted on google cloud. I am using a windows 10 operating system and accessing aggregate through google chrome.

What steps can we take to reproduce the problem?
Not sure.

What you have you tried to fix the problem?
For one of the forms, I tried creating a new publisher and that worked initially. It pulled all data already uploaded to the server and then I sent a new test submission and it wrote out to google drive immediately. But I came in the next day and there was new data on the server that was not on googlesheets.

Anything else we should know or have? If you have a test form or screenshots or logs, attach here.
We have activated billing, so we shouldn't have problems with quotas or space. The one form that has been streaming since April has a lot of data sitting on the server but the other new form does not. (And, I have other forms with way more data sitting on the server that are streaming fine.)

Thank you again!

(I tried to upload two snips of errors. Not sure that worked or is helpful.)

The two snips did load but they don't have a space between them. So everything you are seeing is not sequential.

Hi @Christy_Marie! In your case, I see you've already located a possible cause that explains the problem. A timeout while authenticating Aggregate in Google's API could produce the error.

I've written up an analysis of your problem and @Jeff_Davids' problem here: Google Sheet Publisher Issue - Somehow DB entities for publisher got into problem state - #9 by ggalmazor

Even though we have a good lead here with the timeout, I'd recommend you to make the test I suggested @Jeff_Davids. Could you try to pull the form and all its submissions using Briefcase? If that goes well, it's telling us that there probably no problems in Aggregate's data.

Hi @ggalmazor, Thanks so much for looking into this! Briefcase pulls with
no problem (and has been throughout this issue). Also of note, and maybe
helpful, is that I can publish uploaded data fine to googlesheets. It is
when I am streaming for two of my datasets that I am having a problem.
Meanwhile, other forms are streaming fine, and everything else on google
drive seems to be functioning fine (although I try to download all the
folders in my google drive 1-2 times per week as a backup and two of the
last three times, it has taken a long time to zip the files, which has been
weird because I've not added anything substantial - I've started to wonder
if I am trying to download at the same time that aggregate is stuck in this
loop and so google is stuck because it can't download a file at the same
time another operation has it locked for editing - this is total
supposition on my part as I am not a programmer though).

For the two forms that are giving me problems right now, I can launch new
forms in a few weeks when we do our annual training and I could just see if
this issue arises with the new forms/publishers. One of them has to be
changed anyway, and I can change the other at the same time. I can manage
things as they are for the time being. But, I am a little worried that this
could happen at any time with any of my forms since it popped up on both a
new and an old form and it is not always possible to switch forms out so
easily. So if you have time to look into this and can figure out what is
causing the problem and/or how to fix, that would be great. But I also know
you all get lots of requests for help and since this is not a crippling
problem for me, it can be back-burnered until I see if it pops up again
with new forms.

Thanks again!
Christy

Hi Christy! Thanks for your detailed reply with lots of valuable info :slight_smile:

When third party services are involved, solving this kind of issues can be tricky. We'll continue investigating... At least, we could propose some changes in Aggregate to improve our logs when dealing with third parties.