Broker version: 0.24.3
We run zeebe on kubernetes with 3 brokers and a gateway (Helm chart) - elasticsearch exporter
Previous to 0.24.3 we experienced alot of problems with our zeebe setup, where brokers would restart and not come back for a long time (days) - related to https://github.com/zeebe-io/zeebe/issues/5135 we believe.
We think the restarts were caused by kubernetes deleting the pods because of memory usage (4gib limit), and on restart it was suffering the issue above, causing the system to be for all purposes down. This as been reported in zeebe slack channel earlier.
With 0.24.3, we saw restarts being quick, but we donât necessarily know how the system will respond once the pods get restarted by kubernetes - which they will be soon due to memory usage. Here is the grafana output for one of the brokers:
Some snapshots from the cluster: Current resource usage:
Uptime:
Currently we are running one workflow, with approximately 500-1000 instances per 24/h
The issue looks similar to: Zeebe broker 0.20.0 memory usage is too high
however, given its age, I would have hoped for it to have been addressed by now, based on the severity.
I will update the issue once the pods are restarted, and describe the behaviour we see (does it restart quickly or does it cause downtime like before)
Cheers,
Lars