Zeebe Low Performance

arpitagarwal78 · June 23, 2020, 4:45am

Hi Team,

Broker
v0.22.0 ( setup using https://helm.zeebe.io/)

Configuration 3 nodes
5CPU and 12G Memory

Client Library
“io.zeebe.spring” % “spring-zeebe-starter” % “0.22.0”

Use case
Execute a basic workflow and achieve benchmark results.
The workflow consists of one job worker which completes itself on invoking.
Fired 1000 request / seconds over a period of 6 min.
Around ~300368 instance creation request were send over a period of 6 min.

Observations in terms of values grab from Grafana Metrics
Dropped 108600 request out of 300368 total requests
Instance created and completed: 3184 in 1h, Rest no info about pending 188584 instances was seen on grafana (~0.88 instance / sec)

Thus we saw huge delay in terms of workflow creation and completion.

Please can someone help me as to what configuration I am missing ?

Zelldon · June 24, 2020, 8:34am

Hey @arpitagarwal78

welcome and thanks for trying out Zeebe

In order to help you here I need some more information.
Could you please post your whole Zeebe configuration, because it is necessary to know how many partitions you’re using and how many threads etc. Either post your helm values file or the log output of the configuration which is printed on start of the brokers.

Any reason why you use 0.22.0 and not the latest version (0.23.3)?

Greets
Chris

jwulf · June 24, 2020, 7:05pm

Also, see this post.

arpitagarwal78 · June 26, 2020, 4:06am

Hi @Zelldon

We are using v0.22.3 Zeebe Broker and using all the configuration as listed in

We started with v0.22.3 wanted to benchmark it and see how is zeebe performance

@jwulf

The configuration in the post is of zeebe v0.23.2.

Since we are new to zeebe maybe a detailed documentation or explanation will be really helpful.

Zelldon · June 26, 2020, 11:19am

Not sure what you mean with that? So you use the default configuration? Which would be:

clusterSize: "3"
partitionCount: "3"
replicationFactor: "3"
cpuThreadCount: "2"
ioThreadCount: "2"

I think you need to increase your partition count and thread count to get more throughput here.
You need to play around a bit with these configuration to find the optimal setting for your use case I would say.

Greets
Chris

arpitagarwal78 · June 26, 2020, 4:23pm

@Zelldon Thanks for all your help

currently we have these configuration in place

clusterSize: 3
partitionCount: 3
replicationFactor: 3
cpuThreadCount: 4
ioThreadCount: 4

Please can you also tell if there is some doc which help us to reach the benchmark results ?

arpitagarwal78 · June 26, 2020, 6:34pm

@Zelldon

I had attached the screenshot since group is not allowing me to post multiple photos. It is single big screenshot have all the metrics related details for load run

Please let me know if some pointers

arpitagarwal78 · June 26, 2020, 6:36pm

Please download the image and zoom in to get all info

arpitagarwal78 · June 28, 2020, 8:11am

@Zelldon

Please let me know if some pointers

Zelldon · June 29, 2020, 7:26am

Hey @arpitagarwal78

could you share your benchmark setup? So how do you start/create the workflow instances etc?
How many job workers do you have? What is the activation count?
It is expected that lot of requests are dropped on higher load. I’m wondering why you see no difference when you use a different partition count.

You wrote that you’re using the helm charts, could you share what helm chart version do you use?
Maybe you can also share your values file.

Greets
Chris

arpitagarwal78 · June 29, 2020, 1:31pm

@Zelldon

Thanks for all your support!

Current Setup
3 brokers with 5vcpu 12G ram

Start / Create instances

zeebeClientLifecycle
    .newPublishMessageCommand*
    .messageName("PhoneContact")*
    .correlationKey("contactId")*
    .messageId(String.valueOf(UUID.randomUUID))*
    .variables(Map("contactId" -> UUID.randomUUID().toString).asJava)*
    .send()

Job Worker
1 with Thread count 8

What is the activation count?
Not sure about this

Helm configurations
https://github.com/zeebe-io/zeebe-cluster-helm

Custom chart file

global:
    virtualservice:
        enabled: true
        ingressGateway: default-gateway-internal.istio-system.svc.cluster.local
        host: "zeebe-istio.devus1.<something>.com"

# This profile is adapted from https://github.com/zeebe-io/zeebe-helm-profiles/blob/master/zeebe-core-team.yaml
zeebe:
    image:
        tag: 0.22.3
    # ZEEBE CFG

    clusterSize: 3
    partitionCount: 3
    replicationFactor: 3
    cpuThreadCount: 4
    ioThreadCount: 4

    prometheus:
        serviceMonitor:
            enabled: true

    # tolerations:
    tolerations:
    - effect: NoExecute
      key: role
      operator: Equal
      value: zeebe
    affinity:
      nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
            - matchExpressions:
                - key: role
                  operator: In
                  values: 
                    - zeebe
    
   
    # JavaOpts:
    # DEFAULTS

    JavaOpts: |
        -XX:+UseParallelGC 
        -XX:MinHeapFreeRatio=5
        -XX:MaxHeapFreeRatio=10
        -XX:MaxRAMPercentage=25.0 
        -XX:GCTimeRatio=4 
        -XX:AdaptiveSizePolicyWeight=90
        -XX:+PrintFlagsFinal
        -Xmx4g
        -Xms4g
        -XX:+HeapDumpOnOutOfMemoryError
        -XX:HeapDumpPath=/usr/local/zeebe/data
        -XX:ErrorFile=/usr/local/zeebe/data/zeebe_error%p.log
    # RESOURCES

    resources:
        limits:
            cpu: 5
            memory: 12Gi
        requests:
            cpu: 5
            memory: 12Gi

    # PVC

    pvcAccessMode: ["ReadWriteOnce"]
    pvcSize: 128Gi
    #storageClassName: "ssd"        

    # ELASTIC

    elasticsearch:
        replicas: 3
        minimumMasterNodes: 2

        volumeClaimTemplate:
            accessModes: [ "ReadWriteOnce" ]
            #storageClassName: "ssd"
            resources:
                requests:
                    storage: 100Gi

        esJavaOpts: "-Xmx4g -Xms4g"
        # tolerations:
        tolerations:
        - effect: NoExecute
          key: role
          operator: Equal
          value: elasticsearch
  
        resources:
            requests:
                cpu: 3
                memory: 8Gi
            limits:
                cpu: 3
                memory: 8Gi

Zelldon · June 30, 2020, 8:51am

Hey @arpitagarwal78

thanks for providing more insights in your setup.

One thing you should do is to increase the Standalone Gateway threads, per default is uses one thread this can be a bottleneck.


gateway:
  replicas: 1
  logLevel: debug
  env:
    - name: ZEEBE_GATEWAY_MONITORING_ENABLED
      value: "true"
    - name: ZEEBE_GATEWAY_THREADS_MANAGEMENTTHREADS
      value: "4"

For more details you can have a look at our benchmark setup https://github.com/zeebe-io/zeebe/blob/develop/benchmarks/setup/default/zeebe-values.yaml

How do you configure your job worker? I’m not sure whether one worker is sufficient. You have one worker which can concurrently work on 8 jobs at once. Depending on the activation count it will also compete with the jobs available in the worker.

I assume your workflow has a message start event and one task? Or how does your workflow look like?

Greets
Chris

arpitagarwal78 · June 30, 2020, 4:17pm

Hi @Zelldon

Appreciate all your support

Currently we use Spring Zeebe library. Thus we configure job worker using

@ZeebeWorker(`type` = "MyJob", name = "MyJob", maxJobsActive = 200)

which has a basic implementation as

client
      .newCompleteCommand(job.getKey)
      .variables(
        Map(
          "contactId" -> job.getVariablesAsMap.get("contactId"),
          "result" -> headers.get("expression")
        ).asJava
      )
      .send()

Currently we have our workflow as depicted in the image below

FYI
Because we are using the old charts, and version 0.22.3, at the time, standalone gateway was not supported. We have 3 zeebe broker nodes acting as gateway nodes as well.

Regards
Arpit

Zelldon · July 1, 2020, 6:16am

Hey @arpitagarwal78

thanks for providing more details.

Any reason why you stick with the old version? I would encourage to update, since we fixed some issues and made some further improvements on the charts and also on Zeebe.

I think after you upgraded I could try to play around with the cluster size and partition count. For example you could increase your cluster size to 5 nodes and 20 partition with replication factor 3. This would mean you would get the following matrix. This should help to spread the work better.

Distribution:
P\N|	N 0|	N 1|	N 2|	N 3|	N 4
P 0|	L  |	F  |	F  |	-  |	-  
P 1|	-  |	L  |	F  |	F  |	-  
P 2|	-  |	-  |	L  |	F  |	F  
P 3|	F  |	-  |	-  |	L  |	F  
P 4|	F  |	F  |	-  |	-  |	L  
P 5|	L  |	F  |	F  |	-  |	-  
P 6|	-  |	L  |	F  |	F  |	-  
P 7|	-  |	-  |	L  |	F  |	F  
P 8|	F  |	-  |	-  |	L  |	F  
P 9|	F  |	F  |	-  |	-  |	L  
P 10|	L  |	F  |	F  |	-  |	-  
P 11|	-  |	L  |	F  |	F  |	-  
P 12|	-  |	-  |	L  |	F  |	F  
P 13|	F  |	-  |	-  |	L  |	F  
P 14|	F  |	F  |	-  |	-  |	L  
P 15|	L  |	F  |	F  |	-  |	-  
P 16|	-  |	L  |	F  |	F  |	-  
P 17|	-  |	-  |	L  |	F  |	F  
P 18|	F  |	-  |	-  |	L  |	F  
P 19|	F  |	F  |	-  |	-  |	L  

Partitions per Node:
N 0: 12
N 1: 12
N 2: 12
N 3: 12
N 4: 12

You should then also increase the resources for the nodes to match the partition count per node.
I would then also suggest to use more workers and increase the threads of the gateway.

I hope this helps, but also be aware that we currently have some performance issues which might affect your results. You can find related issues here https://github.com/zeebe-io/zeebe/issues?q=is%3Aopen+is%3Aissue+label%3A"Impact%3A+Performance"

Greets
Chris

Zelldon · July 1, 2020, 11:58am

@arpitagarwal78 I just saw that you commented out #storageClassName: "ssd" in your values file. You should make sure that Zeebe runs on a fast disk like a SSD, since Zeebe does a lot of IO heavy work and using a normal hard disk for example will degrade the performance a lot.

arpitagarwal78 · July 1, 2020, 2:38pm

Hey @Zelldon

We are upgrading zeebe instance now as suggested by you then maybe will post the findings.

Thanks for all your help

jetdream · July 3, 2020, 7:51pm

Hey @Zelldon, thank you for answering questions here.
I have a couple related questions:

Do exporters directly impact broker performance?
Can I expect a broker without exporters to work faster?

arpitagarwal78 · July 21, 2020, 2:47pm

Hi @Zelldon

Hope you are doing good.

We finally migrated our zeebe to the latest 0.23.3 broker.

As a default case

First we ran a load with following configuration

zeebe_gateway_threads 1
cluster size 3
replicas 3
partitions 3
2 workers taste , guess worker each with 8 threads

Request fired = 300371
Request Processed = 88451 (Camunda Operate)
Request dropped = 250600 (Grafana)

Observations
system was clear after 12 min once the request firing was stop. Earlier we used to wait for almost an hour. Yes, back pressure was more and dropped request are more but it is way to better than previous run

Second as suggested by you the following configuration are used

5 nodes and 16 partitions, 4 cpu thread count and 4 io thread count
resources:
limits:
cpu: “5”
memory: 12Gi
requests:
cpu: “5”
memory: 12Gi
8v CPU and 32GB RAM all broker

After the configuration change client is unable to connect with the zeebe broker with following exception

io.grpc.StatusRuntimeException: DEADLINE_EXCEEDED: deadline exceeded after 59.999797437s. [remote_addr=someaddress/173.20.111.1:26500]
	at io.grpc.Status.asRuntimeException(Status.java:533)
	at io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onClose(ClientCalls.java:449)
	at io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:426)
	at io.grpc.internal.ClientCallImpl.access$500(ClientCallImpl.java:66)
	at io.grpc.internal.ClientCallImpl$1CloseInContext.runInContext(ClientCallImpl.java:416)
	at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)

We are not sure why are we facing this issue

The configuration which we are using on the client side is

contactPoint = ourbroker,
    port = 26500,
    plaintextEnabled = true,
    workerName = "default-worker",
    workerThreads  = "8",
    workerMaxJobsActive = "32",
    jobTimeout = "300000", // 5 minutes
    messageTimeToLive = "3600000", // 1 hour
    requestTimeout = "20000", // 20 seconds
    caCertificatePath = "", // Empty string to prevent NPE in Env.getOrElse
    keepAlive = "45000" // 45 seconds

we are using

"io.zeebe.spring" % "spring-zeebe-starter" % zeebeVersion

Please provide us with some pointers @Zelldon

JFI
We wanted to know how much instance/sec can be created and how much a single worker can handle with 8 threads this will help us to know what all changes we can do according to our requirement.

arpitagarwal78 · July 21, 2020, 4:01pm

On changing back management thread config to 1 it started working

But not sure why that created issue

Maybe you can give more insides that would be great

arpitagarwal78 · July 22, 2020, 5:26am

We ran round of test on
zeebe broker 0.23.3

All with following configuration

zeebe_gateway_threads 1
cluster size 3
replicas 3
partitions 3
2 workers taste , guess worker each with 8 threads

Scenario 1

100 request / sec*
request fired = 29997*
request processed = 28144 (Camunda operate)*
dropped = 1853*

Scenario 2

70 request / sec*
request fired = 20979*
request processed = 20454 (Camunda operate)*
dropped = 525*

Scenario 3

60 request / sec*
request fired = 18016*
request processed = 17507 (Camunda operate)*
dropped = 509*

Scenario 4

40 request / sec*
request fired = 12102
request processed = 12027 (Camunda operate)
dropped = 75

Scenario 5

30 request / sec*
request fired = 9274
request processed = 9262 (Camunda operate)
dropped = 12

Scenario 6

26 request / sec*
request fired = 8741
request processed = 8726(Camunda operate)
dropped = 15

Not sure there is always dropped instance! Is it something we need to consider ?