Recover from EndlessRetryStrategy

PhilKershaw · August 28, 2020, 5:15pm

How would one recover from thia error:

ERROR io.zeebe.util.retry.EndlessRetryStrategy - Catched exception class java.lang.IllegalStateException with message Error while processing workflow. Workflow with 2251799813730718 is not deployed, will retry...

Zeebe Standalone broker version 0.24.1

Please let me know what if any further detail is required in order to assist with a resolution.

Philipp_Ossler · August 31, 2020, 3:06am

Hi @PhilKershaw,

welcome to the Zeebe community

How did you end up in this situation?

Please describe the steps that you did before see this error message.
Can you reproduce this failure in a new setup?

Best regards,
Philipp

arjunkumar09 · August 31, 2020, 10:02am

Hey @PhilKershaw Can you describe more details on how you end the situation and where you got the error?

PhilKershaw · August 31, 2020, 5:41pm

@philipp.ossler Thank you for the welcome!

@philipp.ossler / @arjunkumar09 I’ll outline the architecture as a first pass and see where it takes us if that’s ok?

We have a number of services running in AWS in a Fargate configuration within the same cluster. Zeebe is one of the said services. At present - and I’m sure this will be the primary target for scrutiny/advice - we’re running Zeebe in a stand-alone configuration, i.e. broker and gateway running in a single container.

In order to facilitate a persistent filesystem across Zeebe upgrades etc… the container mounts an EFS volume. Ironically, perhaps, this is where problems tend to present themselves. If we’re running Zeebe with a clean slate - empty EFS volume, no indexes in ES etc… - it’s perfectly fine. We can even redeploy a number of times - as yet unmeasured - before we encountered this problem.

In order to get things back up and running we’re manually “cleaning the slate” on the EFS volume and deleting ES indexes. When Zeebe restarts all is well.

The only thing I can propose as a contributing factor is the behaviour around the EFS mount and the shutdown process, perhaps?

rohanjoshi0894 · September 15, 2020, 3:15pm

This could be the right answer for the same.

system · January 31, 2024, 10:08am