RESOURCE_EXHAUSTED: Reached maximum capacity of requests handled

Running Zeebe 0.21.1 community and running into the exception above with these details:

Exception in thread “main” io.zeebe.client.api.command.ClientStatusException: Reached maximum capacity of requests handled
at io.zeebe.client.impl.ZeebeClientFutureImpl.transformExecutionException(ZeebeClientFutureImpl.java:93)
at io.zeebe.client.impl.ZeebeClientFutureImpl.join(ZeebeClientFutureImpl.java:50)
at SimpleHttpBasedProcessApp.main(SimpleHttpBasedProcessApp.java:48)
Caused by: java.util.concurrent.ExecutionException: io.grpc.StatusRuntimeException: RESOURCE_EXHAUSTED: Reached maximum capacity of requests handled
at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
at io.zeebe.client.impl.ZeebeClientFutureImpl.join(ZeebeClientFutureImpl.java:48)
… 1 more

I am running a cluster comprising of 3 nodes using the docker/cluster sample. There are currently no workflow instances in flight. I get this error when attempting to run a single workflow and it occurs for every subsequent attempt in executing the same workflow.

If I run a single instance, I do not seem to run into this same problem. What could I be doing wrong?

@klaus.nji That sounds like the new back-presure feature kicking in… can you help us to reproduce the problem?
It might be a configuration issue…

Look here: https://github.com/zeebe-io/zeebe/pull/3035/files#diff-36a3b3365a45cabcdc35975f05a1bcabR140 . (found this via the release notes)

If you set the log level to trace you will see if it is broker backpressure.

Ok more hints:

  1. you can get that exception if your brokers are experiencing problems of memory or if they are being killed for some reason, that exception might appear while the partition leaders are being elected
  2. How many workers do you have? Are you creating tons of workers?
  3. Can you share the logs of the brokers?

Let’s work together to find what the root cause of the problem is.

Thanks for the quick responses. I am also suspecting a configuration problem but some specifics:

I am attempting to run the cluster on a single box, a MacBook Pro. My hardware specs are as follows:

Model Name: MacBook Pro
Model Identifier: MacBookPro15,1
Processor Name: Intel Core i7
Processor Speed: 2.6 GHz
Number of Processors: 1
Total Number of Cores: 6
L2 Cache (per Core): 256 KB
L3 Cache: 9 MB
Memory: 32 GB
Boot ROM Version: 220.230.16.0.0 (iBridge: 16.16.2542.0.0,0)
Serial Number (system): C02X9581JGH6
Hardware UUID: F965A559-C0C9-5FBA-A142-B06CFF3D42DA

I also read somewhere that a cluster needs to run on as many physical processors or cores(?)… as the number of brokers. So do not know if this is the problem here.

@salaboy, here are some answers to your questions:

  1. you can get that exception if your brokers are experiencing problems of memory or if they are being killed for some reason, that exception might appear while the partition leaders are being elected

I doubt this but will verify again. I usually prune the docker volumes before starting a new experiment.

  1. How many workers do you have? Are you creating tons of workers?

Not really. I am manually running a single workflow.

  1. Can you share the logs of the brokers?

How do I extract the broker logs?

@klaus.nji before nuking your docker containers try docker logs <Container ID> https://docs.docker.com/engine/reference/commandline/logs/

Regarding running a single workflow instance… it doesn’t really matter if you have 1000 workers polling for jobs. So how many Zeebe Workers do you have running?

HTH

@salaboy, when this happened, I did not have many worker instances running. Perhaps 10 max/ . Certainly not in the thousands.

Can you please try to reproduce and share the logs?

@salaboy, if I get into that state again, I will sure capture the logs. Strangely enough I have not been able to see the error again.

But is it acceptable to run a Zeebe cluster on a single box? I have one physical CPU with 6 cores.

@klaus.nji yes I think it is… as soon as you don’t expect it to scale massively… is that for dev purposes right?