ClientStatusException

Hello Camunda experts!

time to time we are receiving follow error

 Servlet.service() for servlet [dispatcherServlet] in context with path [] threw exception [Request processing failed: io.camunda.zeebe.client.api.command.ClientStatusException: deadline exceeded after 9.999959195s. [closed=[], open=[[buffered_nanos=322067, remote_addr=4b174f97-af57-4958-becc-ee431acca8a3.bru-2.zeebe.camunda.io/34.111.221.194:443]]]] with root cause

io.grpc.StatusRuntimeException: DEADLINE_EXCEEDED: deadline exceeded after 9.999959195s. [closed=[], open=[[buffered_nanos=322067, remote_addr=4b174f97-af57-4958-becc-ee431acca8a3.bru-2.zeebe.camunda.io/34.111.221.194:443]]]
	at io.grpc.Status.asRuntimeException(Status.java:539) ~[grpc-api-1.54.2.jar!/:1.54.2]
	at io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onClose(ClientCalls.java:487) ~[grpc-stub-1.54.2.jar!/:1.54.2]
	at io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:576) ~[grpc-core-1.54.2.jar!/:1.54.2]
	at io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:70) ~[grpc-core-1.54.2.jar!/:1.54.2]
	at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInternal(ClientCallImpl.java:757) ~[grpc-core-1.54.2.jar!/:1.54.2]
	at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:736) ~[grpc-core-1.54.2.jar!/:1.54.2]
	at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) ~[grpc-core-1.54.2.jar!/:1.54.2]
	at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133) ~[grpc-core-1.54.2.jar!/:1.54.2]
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[na:na]
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[na:na]
	at java.base/java.lang.Thread.run(Thread.java:1583) ~[na:na]

How we could fix it? We are using Camunda Cloud solution and our servers and camunda’s are in different euro regions?
Should we consider moving our servers in camunda in one region ?

Hi @saltex - what operation are you doing when this error occurs? Does it always occur, or is it intermittent? What version of Camunda is your cluster running?

Hello @nathan.loding
the code was trying to start camunda process and after 10s - received this error.
It is not always. For example the next process was successfully started in 10 min later.

version: Camunda 8.1.14

@saltex - can you share the relevant parts of your code? For instance, are you calling the CreateProcessInstanceWithResult RPC command?

@nathan.loding we do it in follow way:

private final ZeebeClient client;
...
ProcessInstanceEvent processInstance = client.newCreateInstanceCommand()
            .bpmnProcessId("file_name.bpmn")
            .latestVersion()
            .variables(variables)
            .send()
            .join();

@saltex - making sure, since it wasn’t explicitly stated: when you receive that error, the processes are not starting, correct? The issue is that processes are not starting and you get this error, rather than processes are starting but you don’t get a success response.

@nathan.loding

when you receive that error, the processes are not starting, correct?

yes, exactly

The issue is that processes are not starting. How we could prevent such issue ?

@saltex - my first thought is some issue with networking between your client and the SaaS cluster. The “deadline exceeded” error can be almost anything between the client and the server, so it unfortunately doesn’t narrow the possibilities much. Because the processes are not starting, the request isn’t making to your cluster.

Do you have a virtual network for your environment? Do you have a firewall or proxy between your client and the internet? Do you notice any pattern to when the time outs occur?

Unfortunately I don’t believe this is an issue with Camunda, and is likely specific to the environment your client is deployed to, so I can’t offer much else.

@nathan.loding

Do you have a virtual network for your environment? Do you have a firewall or proxy between your client and the internet?

We are using AWS EKS cluster on eu-central-1 region. We don’t use WAF or any other App-level firewalls.

Do you notice any pattern to when the time outs occur?

Seems not any pattern

We are receiving almost the same exception from workers:

Failed to activate jobs for worker ____ and job type ____

io.grpc.StatusRuntimeException: DEADLINE_EXCEEDED: deadline exceeded after 19.999963891s. Name resolution delay 0.000000000 seconds. [closed=[], open=[[buffered_nanos=44015272, remote_addr=99550c0e-2cfb-491d-9584-13e5a9833875.bru-2.zeebe.camunda.io/34.111.221.194:443]]]
	at io.grpc.Status.asRuntimeException(Status.java:537) ~[grpc-api-1.60.0.jar!/:1.60.0]
	at io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onClose(ClientCalls.java:481) ~[grpc-stub-1.60.0.jar!/:1.60.0]
	at io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:574) ~[grpc-core-1.60.0.jar!/:1.60.0]
	at io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:72) ~[grpc-core-1.60.0.jar!/:1.60.0]
	at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInternal(ClientCallImpl.java:742) ~[grpc-core-1.60.0.jar!/:1.60.0]
	at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:723) ~[grpc-core-1.60.0.jar!/:1.60.0]
	at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) ~[grpc-core-1.60.0.jar!/:1.60.0]
	at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133) ~[grpc-core-1.60.0.jar!/:1.60.0]
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[na:na]
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[na:na]
	at java.base/java.lang.Thread.run(Thread.java:1583) ~[na:na]

and somitems follow :

Failed to activate jobs for worker ____and job type ____

io.grpc.StatusRuntimeException: UNAVAILABLE: HTTP status code 502
invalid content-type: text/html
headers: Metadata(:status=502,date=Mon, 08 Apr 2024 12:12:48 GMT,content-type=text/html,strict-transport-security=max-age=63072000; includeSubDomains,content-length=150)
DATA-----------------------------
<html>
<head><title>502 Bad Gateway</title></head>
<body>
<center><h1>502 Bad Gateway</h1></center>
<hr><center>nginx</center>
</body>
</html>

	at io.grpc.Status.asRuntimeException(Status.java:539) ~[grpc-api-1.54.2.jar!/:1.54.2]
	at io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onClose(ClientCalls.java:487) ~[grpc-stub-1.54.2.jar!/:1.54.2]
	at io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:576) ~[grpc-core-1.54.2.jar!/:1.54.2]
	at io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:70) ~[grpc-core-1.54.2.jar!/:1.54.2]
	at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInternal(ClientCallImpl.java:757) ~[grpc-core-1.54.2.jar!/:1.54.2]
	at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:736) ~[grpc-core-1.54.2.jar!/:1.54.2]
	at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) ~[grpc-core-1.54.2.jar!/:1.54.2]
	at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133) ~[grpc-core-1.54.2.jar!/:1.54.2]
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[na:na]
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[na:na]
	at java.base/java.lang.Thread.run(Thread.java:1583) ~[na:na]

maybe it will help to understand the issue better ?

@saltex - that still looks like a networking error to me. Are you on the Starter or Enterprise SaaS plan? If so, I’d recommend reaching out to the support team.

I work with @saltex .
What networking errors do you mean? We have pretty simple infrastructure - container on AWS-managed EKS. No firewalls, no extra rules.

Today we faced this issue:

Caused by: io.grpc.StatusRuntimeException: DEADLINE_EXCEEDED: deadline exceeded after 9.999966153s. [closed=[], open=[[buffered_nanos=225825, remote_addr=4b174f97-af57-4958-becc-ee431acca8a3.bru-2.zeebe.camunda.io/34.111.221.194:443]]]
        at io.grpc.Status.asRuntimeException(Status.java:539) ~[grpc-api-1.54.2.jar!/:1.54.2]
        at io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onClose(ClientCalls.java:487) ~[grpc-stub-1.54.2.jar!/:1.54.2]
        at io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:576) ~[grpc-core-1.54.2.jar!/:1.54.2]
        at io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:70) ~[grpc-core-1.54.2.jar!/:1.54.2]
        at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInternal(ClientCallImpl.java:757) ~[grpc-core-1.54.2.jar!/:1.54.2]
        at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:736) ~[grpc-core-1.54.2.jar!/:1.54.2]
        at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) ~[grpc-core-1.54.2.jar!/:1.54.2]
        at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133) ~[grpc-core-1.54.2.jar!/:1.54.2]
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[na:na]
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[na:na]
        ... 1 common frames omitted

2024-04-09T11:35:24.488Z  WARN 1 --- [nio-8080-exec-7] o.s.b.a.health.HealthEndpointSupport     : Health contributor io.camunda.zeebe.spring.client.actuator.ZeebeClientHealthIndicator (zeebeClient) took 10002ms to respond

Are you sure you there are no issues on Cloud?

We also faced these errors (bellow).
Are there any debugging tools on cluster’s side that we can use?

io.grpc.StatusRuntimeException: UNAVAILABLE: HTTP status code 502
invalid content-type: text/html
headers: Metadata(:status=502,date=Tue, 09 Apr 2024 11:21:28 GMT,content-type=text/html,strict-transport-security=max-age=63072000; includeSubDomains,content-length=150)
DATA-----------------------------
<html>
<head><title>502 Bad Gateway</title></head>
<body>
<center><h1>502 Bad Gateway</h1></center>
<hr><center>nginx</center>
</body>
</html>


io.grpc.StatusRuntimeException: DEADLINE_EXCEEDED: deadline exceeded after 19.999977227s. Name resolution delay 0.000000000 seconds. [closed=[], open=[[buffered_nanos=172557, remote_addr=52386cd0-b3f8-45c0-811b-a61fe1c0c83a.bru-2.zeebe.camunda.io/34.111.221.194:443]]]

Hi @sartemo - typically those gRPC errors indicate the client was unable to establish a connection with the server. There are no platform-wide issues with SaaS (and haven’t been since @saltex’s first post), but I don’t have visibility into your cluster itself. Do you have a Starter or Enterprise SaaS plan?

Hi @nathan.loding . I’m not sure that 502 error may point on connectivity issues. Connection was established and refused. Or you are talking about my first response?

We are on Starter plan.

Thanks

@sartemo - the Starter Plan comes with 8x5 technical support, I would recommend reaching out to the support team. This is a community support forum, not an official support channel; and though I work for Camunda, I don’t have access to your cluster(s) so I can’t perform the level of debugging the support team can.

1 Like

Thank you @nathan.loding . Will do that

@nathan.loding, sorry to dumb question. How do I create support ticket? Maybe I don’t have enough permissions, but I don’t see any links in my Camunda dashboard for that.

@sartemo - there’s some links here, if you scroll about 3/4’s way down: https://camunda.com/services/support/

@nathan.loding thank you
we opened a support ticket