Help with deployment using helm zeebe-full-helm

Hello. First time to use helm!

Up to now, I’ve mostly used the downloadable binary and once I used docker compose.

Now I’m stuck. When I try to deploy the workflow, it doesn’t work. Here’s what I’ve done so far…

I’ve followed the instructions to use helm to install zeebe/zeebe-full into a namespace monitoring to use our cluster. Using kubectl get pods I can see all the pods are running:

% kubectl get pods --namespace monitoring
NAME                                                     READY   STATUS    RESTARTS   AGE
elasticsearch-master-0                                   1/1     Running   0          111m
elasticsearch-master-1                                   1/1     Running   0          111m
elasticsearch-master-2                                   1/1     Running   0          111m
zb-demo-nginx-ingress-controller-58655d578c-dpvr7        1/1     Running   0          111m
zb-demo-nginx-ingress-default-backend-5497cc797b-586mh   1/1     Running   0          111m
zb-demo-operate-5bddff76b5-n727d                         1/1     Running   1          111m
zb-demo-zeebe-0                                          1/1     Running   0          111m
zb-demo-zeebe-1                                          1/1     Running   0          111m
zb-demo-zeebe-2                                          1/1     Running   0          111m

After setting the port-forward and leaving it going, I can query the status using zbctl:

% bin/zbctl status
Cluster size: 3
Partitions count: 3
Replication factor: 3
Brokers:
  Broker 0 - zb-demo-zeebe-0.zb-demo-zeebe.monitoring.svc.cluster.local:26501
    Partition 1 : Follower
    Partition 3 : Follower
  Broker 1 - zb-demo-zeebe-1.zb-demo-zeebe.monitoring.svc.cluster.local:26501
    Partition 1 : Follower
    Partition 2 : Follower
    Partition 3 : Follower
  Broker 2 - zb-demo-zeebe-2.zb-demo-zeebe.monitoring.svc.cluster.local:26501
    Partition 2 : Follower
    Partition 3 : Leader

However I am unable to deploy a workflow. I know the workflow is fine, as I’ve been using it previously in a stand-alone download of the broker.

I can even launch operate referring to the external IP I see when I run kubectl get svc --namespace monitoring

However when I attempt to deploy, I get an error:

% bin/zbctl deploy /path/to/PksExperiment/pks-resize.bpmn
Error: rpc error: code = DeadlineExceeded desc = context deadline exceeded
Usage:
  zbctl deploy <workflowPath> [flags] ...

When I tail the log in pod zb-demo-zeebe-1 I see this:

2019-12-03 21:23:52.507 [io.zeebe.gateway.impl.broker.BrokerRequestManager] [zb-demo-zeebe-1.zb-demo-zeebe.monitoring.svc.cluster.local:26501-zb-actors-0] ERROR io.zeebe.gateway - Error handling gRPC request
io.grpc.StatusRuntimeException: INTERNAL: Unexpected error occurred during the request processing
	at io.grpc.Status.asRuntimeException(Status.java:524) ~[grpc-api-1.24.0.jar:1.24.0]
	at io.zeebe.gateway.EndpointManager.convertThrowable(EndpointManager.java:291) ~[zeebe-gateway-0.21.1.jar:0.21.1]
	at io.zeebe.gateway.EndpointManager.lambda$sendRequest$3(EndpointManager.java:269) ~[zeebe-gateway-0.21.1.jar:0.21.1]
	at io.zeebe.gateway.impl.broker.BrokerRequestManager.lambda$sendRequest$3(BrokerRequestManager.java:130) ~[zeebe-gateway-0.21.1.jar:0.21.1]
	at io.zeebe.gateway.impl.broker.BrokerRequestManager.lambda$sendRequestInternal$5(BrokerRequestManager.java:161) ~[zeebe-gateway-0.21.1.jar:0.21.1]
	at io.zeebe.util.sched.future.FutureContinuationRunnable.run(FutureContinuationRunnable.java:32) [zeebe-util-0.21.1.jar:0.21.1]
	at io.zeebe.util.sched.ActorJob.invoke(ActorJob.java:76) [zeebe-util-0.21.1.jar:0.21.1]
	at io.zeebe.util.sched.ActorJob.execute(ActorJob.java:39) [zeebe-util-0.21.1.jar:0.21.1]
	at io.zeebe.util.sched.ActorTask.execute(ActorTask.java:127) [zeebe-util-0.21.1.jar:0.21.1]
	at io.zeebe.util.sched.ActorThread.executeCurrentTask(ActorThread.java:107) [zeebe-util-0.21.1.jar:0.21.1]
	at io.zeebe.util.sched.ActorThread.doWork(ActorThread.java:91) [zeebe-util-0.21.1.jar:0.21.1]
	at io.zeebe.util.sched.ActorThread.run(ActorThread.java:195) [zeebe-util-0.21.1.jar:0.21.1]
Caused by: io.zeebe.transport.RequestTimeoutException: Request timed out after PT15S
	at io.zeebe.transport.impl.sender.OutgoingRequest.timeout(OutgoingRequest.java:143) ~[zeebe-transport-0.21.1.jar:0.21.1]
	at io.zeebe.transport.impl.sender.Sender.onTimerExpiry(Sender.java:291) ~[zeebe-transport-0.21.1.jar:0.21.1]
	at org.agrona.DeadlineTimerWheel.poll(DeadlineTimerWheel.java:344) ~[agrona-1.0.7.jar:1.0.7]
	at io.zeebe.transport.impl.sender.Sender.processTimeouts(Sender.java:100) ~[zeebe-transport-0.21.1.jar:0.21.1]
	... 6 more

Please help!

Thank you :slight_smile:

Hey,

it seems you have no leader for partition 1, which is the reason why you can’t deploy any resources. The partition one is the deployment partition. Could you verify whether all pods are ready? How fast after the cluster start you tried to deploy the resource? Could you retry it after all nodes are ready.

Greets
Chris

Hi Chris,

I deleted all 3 zeebe pods and after several minutes I saw this:

% bin/zbctl status --insecure
Cluster size: 3
Partitions count: 3
Replication factor: 3
Brokers:
  Broker 0 - zb-demo-zeebe-0.zb-demo-zeebe.monitoring.svc.cluster.local:26501
    Partition 1 : Follower
    Partition 2 : Follower
    Partition 3 : Follower
  Broker 1 - zb-demo-zeebe-1.zb-demo-zeebe.monitoring.svc.cluster.local:26501
    Partition 1 : Leader
    Partition 2 : Follower
    Partition 3 : Leader
  Broker 2 - zb-demo-zeebe-2.zb-demo-zeebe.monitoring.svc.cluster.local:26501
    Partition 1 : Follower
    Partition 2 : Follower
    Partition 3 : Follower

And then I successfully deployed my workflow! So we always need a leader for partition 1, but not for either of the other partitions? Like right now, partition 2 has no leader. How does it come to pass, that a partition gets designated as a leader? Was deleting the pod (to force its recreation) a good move?

Thank you,
Kimberly.

@kwalker17 that sounds like a bug to me… each partition always should have a leader.

It depends. It can happen that restart take longer due to reprocessing. So when he restarts and has no clean state it may take a while.

1 Like

@Zelldon just to be clear… if there is a new leader after a while it should be fine.

Yes this would be expected.