Zeebe Low Performance

@Zelldon

I had attached the screenshot since group is not allowing me to post multiple photos. It is single big screenshot have all the metrics related details for load run

Please let me know if some pointers

1 Like

Please download the image and zoom in to get all info

@Zelldon

Please let me know if some pointers

Hey @arpitagarwal78

could you share your benchmark setup? So how do you start/create the workflow instances etc?
How many job workers do you have? What is the activation count?
It is expected that lot of requests are dropped on higher load. I’m wondering why you see no difference when you use a different partition count.

You wrote that you’re using the helm charts, could you share what helm chart version do you use?
Maybe you can also share your values file.

Greets
Chris

1 Like

@Zelldon

Thanks for all your support!

Current Setup
3 brokers with 5vcpu 12G ram

Start / Create instances

zeebeClientLifecycle
    .newPublishMessageCommand*
    .messageName("PhoneContact")*
    .correlationKey("contactId")*
    .messageId(String.valueOf(UUID.randomUUID))*
    .variables(Map("contactId" -> UUID.randomUUID().toString).asJava)*
    .send()

Job Worker
1 with Thread count 8

What is the activation count?
Not sure about this

Helm configurations
https://github.com/zeebe-io/zeebe-cluster-helm

Custom chart file

global:
    virtualservice:
        enabled: true
        ingressGateway: default-gateway-internal.istio-system.svc.cluster.local
        host: "zeebe-istio.devus1.<something>.com"

# This profile is adapted from https://github.com/zeebe-io/zeebe-helm-profiles/blob/master/zeebe-core-team.yaml
zeebe:
    image:
        tag: 0.22.3
    # ZEEBE CFG

    clusterSize: 3
    partitionCount: 3
    replicationFactor: 3
    cpuThreadCount: 4
    ioThreadCount: 4

    prometheus:
        serviceMonitor:
            enabled: true

    # tolerations:
    tolerations:
    - effect: NoExecute
      key: role
      operator: Equal
      value: zeebe
    affinity:
      nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
            - matchExpressions:
                - key: role
                  operator: In
                  values: 
                    - zeebe
    
   
    # JavaOpts:
    # DEFAULTS

    JavaOpts: |
        -XX:+UseParallelGC 
        -XX:MinHeapFreeRatio=5
        -XX:MaxHeapFreeRatio=10
        -XX:MaxRAMPercentage=25.0 
        -XX:GCTimeRatio=4 
        -XX:AdaptiveSizePolicyWeight=90
        -XX:+PrintFlagsFinal
        -Xmx4g
        -Xms4g
        -XX:+HeapDumpOnOutOfMemoryError
        -XX:HeapDumpPath=/usr/local/zeebe/data
        -XX:ErrorFile=/usr/local/zeebe/data/zeebe_error%p.log
    # RESOURCES

    resources:
        limits:
            cpu: 5
            memory: 12Gi
        requests:
            cpu: 5
            memory: 12Gi

    # PVC

    pvcAccessMode: ["ReadWriteOnce"]
    pvcSize: 128Gi
    #storageClassName: "ssd"        

    # ELASTIC

    elasticsearch:
        replicas: 3
        minimumMasterNodes: 2

        volumeClaimTemplate:
            accessModes: [ "ReadWriteOnce" ]
            #storageClassName: "ssd"
            resources:
                requests:
                    storage: 100Gi

        esJavaOpts: "-Xmx4g -Xms4g"
        # tolerations:
        tolerations:
        - effect: NoExecute
          key: role
          operator: Equal
          value: elasticsearch
  
        resources:
            requests:
                cpu: 3
                memory: 8Gi
            limits:
                cpu: 3
                memory: 8Gi

Hey @arpitagarwal78

thanks for providing more insights in your setup.

One thing you should do is to increase the Standalone Gateway threads, per default is uses one thread this can be a bottleneck.


gateway:
  replicas: 1
  logLevel: debug
  env:
    - name: ZEEBE_GATEWAY_MONITORING_ENABLED
      value: "true"
    - name: ZEEBE_GATEWAY_THREADS_MANAGEMENTTHREADS
      value: "4"

For more details you can have a look at our benchmark setup https://github.com/zeebe-io/zeebe/blob/develop/benchmarks/setup/default/zeebe-values.yaml

How do you configure your job worker? I’m not sure whether one worker is sufficient. You have one worker which can concurrently work on 8 jobs at once. Depending on the activation count it will also compete with the jobs available in the worker.

I assume your workflow has a message start event and one task? Or how does your workflow look like?

Greets
Chris

1 Like

Hi @Zelldon

Appreciate all your support :heart:

Currently we use Spring Zeebe library. Thus we configure job worker using

@ZeebeWorker(`type` = "MyJob", name = "MyJob", maxJobsActive = 200)

which has a basic implementation as

client
      .newCompleteCommand(job.getKey)
      .variables(
        Map(
          "contactId" -> job.getVariablesAsMap.get("contactId"),
          "result" -> headers.get("expression")
        ).asJava
      )
      .send()

Currently we have our workflow as depicted in the image below

FYI
Because we are using the old charts, and version 0.22.3, at the time, standalone gateway was not supported. We have 3 zeebe broker nodes acting as gateway nodes as well.

Regards
Arpit

Hey @arpitagarwal78

thanks for providing more details.

Any reason why you stick with the old version? I would encourage to update, since we fixed some issues and made some further improvements on the charts and also on Zeebe.

I think after you upgraded I could try to play around with the cluster size and partition count. For example you could increase your cluster size to 5 nodes and 20 partition with replication factor 3. This would mean you would get the following matrix. This should help to spread the work better.

Distribution:
P\N|	N 0|	N 1|	N 2|	N 3|	N 4
P 0|	L  |	F  |	F  |	-  |	-  
P 1|	-  |	L  |	F  |	F  |	-  
P 2|	-  |	-  |	L  |	F  |	F  
P 3|	F  |	-  |	-  |	L  |	F  
P 4|	F  |	F  |	-  |	-  |	L  
P 5|	L  |	F  |	F  |	-  |	-  
P 6|	-  |	L  |	F  |	F  |	-  
P 7|	-  |	-  |	L  |	F  |	F  
P 8|	F  |	-  |	-  |	L  |	F  
P 9|	F  |	F  |	-  |	-  |	L  
P 10|	L  |	F  |	F  |	-  |	-  
P 11|	-  |	L  |	F  |	F  |	-  
P 12|	-  |	-  |	L  |	F  |	F  
P 13|	F  |	-  |	-  |	L  |	F  
P 14|	F  |	F  |	-  |	-  |	L  
P 15|	L  |	F  |	F  |	-  |	-  
P 16|	-  |	L  |	F  |	F  |	-  
P 17|	-  |	-  |	L  |	F  |	F  
P 18|	F  |	-  |	-  |	L  |	F  
P 19|	F  |	F  |	-  |	-  |	L  

Partitions per Node:
N 0: 12
N 1: 12
N 2: 12
N 3: 12
N 4: 12

You should then also increase the resources for the nodes to match the partition count per node.
I would then also suggest to use more workers and increase the threads of the gateway.

I hope this helps, but also be aware that we currently have some performance issues which might affect your results. You can find related issues here https://github.com/zeebe-io/zeebe/issues?q=is%3Aopen+is%3Aissue+label%3A"Impact%3A+Performance"

Greets
Chris

1 Like

@arpitagarwal78 I just saw that you commented out #storageClassName: "ssd" in your values file. You should make sure that Zeebe runs on a fast disk like a SSD, since Zeebe does a lot of IO heavy work and using a normal hard disk for example will degrade the performance a lot.

1 Like

Hey @Zelldon

We are upgrading zeebe instance now as suggested by you then maybe will post the findings.

Thanks for all your help

Hey @Zelldon, thank you for answering questions here.
I have a couple related questions:

  • Do exporters directly impact broker performance?
  • Can I expect a broker without exporters to work faster?

Hi @Zelldon

Hope you are doing good.

We finally migrated our zeebe to the latest 0.23.3 broker.

As a default case

First we ran a load with following configuration

  • zeebe_gateway_threads 1
  • cluster size 3
  • replicas 3
  • partitions 3
  • 2 workers taste , guess worker each with 8 threads

Request fired = 300371
Request Processed = 88451 (Camunda Operate)
Request dropped = 250600 (Grafana)

Observations
system was clear after 12 min once the request firing was stop. Earlier we used to wait for almost an hour. Yes, back pressure was more and dropped request are more but it is way to better than previous run

Second as suggested by you the following configuration are used

  • 5 nodes and 16 partitions, 4 cpu thread count and 4 io thread count
  • resources:
    limits:
    cpu: “5”
    memory: 12Gi
    requests:
    cpu: “5”
    memory: 12Gi
  • 8v CPU and 32GB RAM all broker

After the configuration change client is unable to connect with the zeebe broker with following exception

io.grpc.StatusRuntimeException: DEADLINE_EXCEEDED: deadline exceeded after 59.999797437s. [remote_addr=someaddress/173.20.111.1:26500]
	at io.grpc.Status.asRuntimeException(Status.java:533)
	at io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onClose(ClientCalls.java:449)
	at io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:426)
	at io.grpc.internal.ClientCallImpl.access$500(ClientCallImpl.java:66)
	at io.grpc.internal.ClientCallImpl$1CloseInContext.runInContext(ClientCallImpl.java:416)
	at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)

We are not sure why are we facing this issue

The configuration which we are using on the client side is

contactPoint = ourbroker,
    port = 26500,
    plaintextEnabled = true,
    workerName = "default-worker",
    workerThreads  = "8",
    workerMaxJobsActive = "32",
    jobTimeout = "300000", // 5 minutes
    messageTimeToLive = "3600000", // 1 hour
    requestTimeout = "20000", // 20 seconds
    caCertificatePath = "", // Empty string to prevent NPE in Env.getOrElse
    keepAlive = "45000" // 45 seconds

we are using

"io.zeebe.spring" % "spring-zeebe-starter" % zeebeVersion

Please provide us with some pointers @Zelldon

JFI
We wanted to know how much instance/sec can be created and how much a single worker can handle with 8 threads this will help us to know what all changes we can do according to our requirement.

On changing back management thread config to 1 it started working

But not sure why that created issue

Maybe you can give more insides that would be great

We ran round of test on
zeebe broker 0.23.3

All with following configuration

  • zeebe_gateway_threads 1
  • cluster size 3
  • replicas 3
  • partitions 3
  • 2 workers taste , guess worker each with 8 threads

Scenario 1

  • 100 request / sec*
  • request fired = 29997*
  • request processed = 28144 (Camunda operate)*
  • dropped = 1853*

Scenario 2

  • 70 request / sec*
  • request fired = 20979*
  • request processed = 20454 (Camunda operate)*
  • dropped = 525*

Scenario 3

  • 60 request / sec*
  • request fired = 18016*
  • request processed = 17507 (Camunda operate)*
  • dropped = 509*

Scenario 4

  • 40 request / sec*
  • request fired = 12102
  • request processed = 12027 (Camunda operate)
  • dropped = 75

Scenario 5

  • 30 request / sec*
  • request fired = 9274
  • request processed = 9262 (Camunda operate)
  • dropped = 12

Scenario 6

  • 26 request / sec*
  • request fired = 8741
  • request processed = 8726(Camunda operate)
  • dropped = 15

Not sure there is always dropped instance! Is it something we need to consider ?

We ran round of test on
zeebe broker 0.23.3

All with following configuration

  • 5 nodes and 16 partitions, 4 cpu thread count and 1 io thread count
  • resources:
    *limits:
    *cpu: “5”
    *memory: 12Gi
    *cpu: “5”
    *memory: 12Gi
    *8v CPU and 32GB RAM all broker

Scenario 1
request = 30 req/sec
request fired = 9274
request processed = 9274(Camunda operate)
dropped =0 (in grafana)
No drop No delay

Scenario 2
request = 1000 req/sec
request fired = 300363
request processed = Operate crashed! (Camunda operate)
dropped =
Operate crashed!
request seen in grafana = 996000
dropped = 499000

Since operate crashed next all load runs have data captured from grafana.

What we noticed was grafana values are not in sync with the total request fired from the Gatling side but operate has the exact values at least the number of instances running / completed

But still for reference we are adding all our observations

Scenario 3
request = 250 req / sec
request fired = 75059
request seen in grafana = 234200
dropped = 15760

Scenario 4
request = 150 req / sec
request fired = 45025
request seen in grafana = 14330
dropped = 7980

Scenario 5
request = 80 req / sec
request fired = 23999
request seen in grafana = 73700
dropped = 1779

Scenario 6
request = 70 req / sec
request fired = 20978
request seen in grafana = 70000
dropped = 874

Scenario 7
request = 40 req / sec
request fired = 12102
request seen in grafana = 43900
dropped = 55

We have added all the observations here. Since we are new to zeebe we want some pointers.

Target
Target which we want to hit is 1000 req / sec (1000 instance created and processed per second) with 1 worker, 8 threads with no dropped request and all instances should be created and completed with no delay.

Since we are not adding any delay in job worker we wanted to hit this target.

Basic workflow which we are running for benchmark results

Hey @arpitagarwal78,
any success on getting target performance?

I’m also very concerned regarding the performance.
Cannot achieve more than 40 starts/sec on 1 broker/4 partitions/4 cpu threads+ 2 io threads.

Hey @jetdream

the exporter and processing are decoupled so I would say given enough resources it should not impact that much. If you have not enough resources like cpu etc. then of course they will compete with each other. I have no prove for that, but we always test with elasticsearch exporter.

You can expect that the broker will delete faster this is an assumption which is safe to take.

Greets
Chris

Hey @arpitagarwal78

sorry for the late response I was on vacation.

After you change the configuration to 5 nodes you mentioned dealine exceeded. How often do you see this? Do you use then a standalone gateway?

It would help to understand the setup better if you always share your complete configuration, e.g. the values file, the helm version etc.

I revisited the previous posts and saw that you mentioned you are starting workflow instances via messages is this still the case? Are you using always the same correlationKey? If you use always the same correlationKey then it will be published on the same partition.

The job workers have 8 threads, but only 32 max jobs activate is this the case? Maybe you can increase the number as well.

In you scenario descriptions what do you mean with “request processed = 20454 (Camunda operate)*”
You see that many instances in operate?

What we noticed was grafana values are not in sync with the total request fired from the Gatling side but operate has the exact values at least the number of instances running / completed

I’m not sure what you mean with this. Do you have an screenshot of the corresponding panel you mean?

Not for now.

Hey @jetdream

how does you deployment look like? Is it local? Do you use a ssd ?

Greets
Chris