Hi,
I have installted a Zeebe cluster in Kubernetes using helm charts. The details are as follows:
Kubernetes client and server: v1.15.3
Platform: GNU/Linux
Helm version: v2.14.3
Zeebe image: camunda/zeebe:0.25.0
Other env variables:
- name: ZEEBE_BROKER_CLUSTER_PARTITIONSCOUNT
value: "3"
- name: ZEEBE_BROKER_CLUSTER_CLUSTERSIZE
value: "3"
- name: ZEEBE_BROKER_CLUSTER_REPLICATIONFACTOR
value: "2"
- name: ZEEBE_BROKER_THREADS_CPUTHREADCOUNT
value: "2"
- name: ZEEBE_BROKER_THREADS_IOTHREADCOUNT
value: "2"
Exporters configured: kafka and hazelcast. ElasticSearch is disbaled.
Workers configured: http-worker
This all works well when partitions count is set to 3 (same as cluster size). But as soon as I change the partitions to > cluster size, the brokers do not come up. For example, I made the following changes in the env variables:
- name: ZEEBE_BROKER_CLUSTER_PARTITIONSCOUNT
value: â7â - name: ZEEBE_BROKER_THREADS_CPUTHREADCOUNT
value: â7â.
(I am running this on an 8 core machine)
After this, brokers do not come up and ofcourse gateway fail to recognize them.
There are no errors in the logs, though there are warnings from raft-server-partitions. There is enough disk space and volume.
The following is the first part of logs from broker-0.
2021-01-25 09:12:30.530 [] [main] INFO io.zeebe.broker.StandaloneBroker - Starting StandaloneBroker v0.25.0 on zeebe-broker-0 with PID 6 (/usr/local/zeebe/lib/zeebe-distribution-0.25.0.jar started by root in /usr/local/zeebe)
2021-01-25 09:12:30.591 [] [main] DEBUG io.zeebe.broker.StandaloneBroker - Running with Spring Boot v2.3.4.RELEASE, Spring v5.2.9.RELEASE
2021-01-25 09:12:30.592 [] [main] INFO io.zeebe.broker.StandaloneBroker - No active profile set, falling back to default profiles: default
2021-01-25 09:12:34.298 [] [main] INFO org.springframework.boot.web.embedded.tomcat.TomcatWebServer - Tomcat initialized with port(s): 9600 (http)
2021-01-25 09:12:34.309 [] [main] INFO org.apache.coyote.http11.Http11NioProtocol - Initializing ProtocolHandler [âhttp-nio-0.0.0.0-9600â]
2021-01-25 09:12:34.310 [] [main] INFO org.apache.catalina.core.StandardService - Starting service [Tomcat]
2021-01-25 09:12:34.310 [] [main] INFO org.apache.catalina.core.StandardEngine - Starting Servlet engine: [Apache Tomcat/9.0.39]
2021-01-25 09:12:34.516 [] [main] INFO org.apache.catalina.core.ContainerBase.[Tomcat].[localhost].[/] - Initializing Spring embedded WebApplicationContext
2021-01-25 09:12:34.517 [] [main] INFO org.springframework.boot.web.servlet.context.ServletWebServerApplicationContext - Root WebApplicationContext: initialization completed in 3821 ms
2021-01-25 09:12:35.613 [] [main] INFO org.springframework.scheduling.concurrent.ThreadPoolTaskExecutor - Initializing ExecutorService âapplicationTaskExecutorâ
2021-01-25 09:12:36.422 [] [main] INFO org.springframework.boot.actuate.endpoint.web.EndpointLinksResolver - Exposing 4 endpoint(s) beneath base path â/actuatorâ
2021-01-25 09:12:36.501 [] [main] INFO org.apache.coyote.http11.Http11NioProtocol - Starting ProtocolHandler [âhttp-nio-0.0.0.0-9600â]
2021-01-25 09:12:36.523 [] [main] INFO org.springframework.boot.web.embedded.tomcat.TomcatWebServer - Tomcat started on port(s): 9600 (http) with context path ââ
2021-01-25 09:12:36.601 [] [main] INFO io.zeebe.broker.StandaloneBroker - Started StandaloneBroker in 7.099 seconds (JVM running for 10.052)
2021-01-25 09:12:36.626 [] [main] DEBUG io.zeebe.broker.system - Initializing system with base path /usr/local/zeebe
2021-01-25 09:12:36.713 [] [main] INFO io.zeebe.broker.system - Version: 0.25.0
2021-01-25 09:12:36.823 [] [main] INFO io.zeebe.broker.system - Starting broker 0 with configuration {
ânetworkâ : {
âhostâ : â0.0.0.0â,
âportOffsetâ : 0,
âmaxMessageSizeâ : â4MBâ,
âadvertisedHostâ : âzeebe-broker-0.zeebe-broker.default.svc.cluster.localâ,
âcommandApiâ : {
âhostâ : â0.0.0.0â,
âportâ : 26501,
âadvertisedHostâ : âzeebe-broker-0.zeebe-broker.default.svc.cluster.localâ,
âadvertisedPortâ : 26501,
âadvertisedAddressâ : âzeebe-broker-0.zeebe-broker.default.svc.cluster.local:26501â,
âaddressâ : â0.0.0.0:26501â
},
âinternalApiâ : {
âhostâ : â0.0.0.0â,
âportâ : 26502,
âadvertisedHostâ : âzeebe-broker-0.zeebe-broker.default.svc.cluster.localâ,
âadvertisedPortâ : 26502,
âadvertisedAddressâ : âzeebe-broker-0.zeebe-broker.default.svc.cluster.local:26502â,
âaddressâ : â0.0.0.0:26502â
},
âmonitoringApiâ : {
âhostâ : â0.0.0.0â,
âportâ : 9600,
âadvertisedHostâ : âzeebe-broker-0.zeebe-broker.default.svc.cluster.localâ,
âadvertisedPortâ : 9600,
âadvertisedAddressâ : âzeebe-broker-0.zeebe-broker.default.svc.cluster.local:9600â,
âaddressâ : â0.0.0.0:9600â
},
âmaxMessageSizeInBytesâ : 4194304
},
âclusterâ : {
âinitialContactPointsâ : [ âzeebe-broker-0.zeebe-broker.default.svc.cluster.local:26502â, âzeebe-broker-1.zeebe-broker.default.svc.cluster.local:26502â, âzeebe-broker-2.zeebe-broker.default.svc.cluster.local:26502â ],
âpartitionIdsâ : [ 1, 2, 3, 4, 5, 6, 7 ],
ânodeIdâ : 0,
âpartitionsCountâ : 7,
âreplicationFactorâ : 2,
âclusterSizeâ : 3,
âclusterNameâ : âzeebe-brokerâ,
âmembershipâ : {
âbroadcastUpdatesâ : false,
âbroadcastDisputesâ : true,
ânotifySuspectâ : false,
âgossipIntervalâ : âPT0.25Sâ,
âgossipFanoutâ : 2,
âprobeIntervalâ : âPT1Sâ,
âprobeTimeoutâ : âPT2Sâ,
âsuspectProbesâ : 3,
âfailureTimeoutâ : âPT10Sâ,
âsyncIntervalâ : âPT10Sâ
}
},
âthreadsâ : {
âcpuThreadCountâ : 7,
âioThreadCountâ : 2
},
âdataâ : {
âdirectoriesâ : [ â/usr/local/zeebe/dataâ ],
âlogSegmentSizeâ : â512MBâ,
âsnapshotPeriodâ : âPT15Mâ,
âlogIndexDensityâ : 100,
âdiskUsageMonitoringEnabledâ : true,
âdiskUsageReplicationWatermarkâ : 0.99,
âdiskUsageCommandWatermarkâ : 0.97,
âdiskUsageMonitoringIntervalâ : âPT1Sâ,
ârocksdbâ : {
âcolumnFamilyOptionsâ : { }
},
âatomixStorageLevelâ : âDISKâ,
âfreeDiskSpaceCommandWatermarkâ : 3220879442,
âfreeDiskSpaceReplicationWatermarkâ : 1073626481,
âlogSegmentSizeInBytesâ : 536870912
},
âexportersâ : {
âŚâŚ
}
âgatewayâ : {
ânetworkâ : {
âhostâ : â0.0.0.0â,
âportâ : 26500,
âminKeepAliveIntervalâ : âPT30Sâ
},
âclusterâ : {
âcontactPointâ : â0.0.0.0:26502â,
ârequestTimeoutâ : âPT15Sâ,
âclusterNameâ : âzeebe-clusterâ,
âmemberIdâ : âgatewayâ,
âhostâ : â0.0.0.0â,
âportâ : 26502,
âmembershipâ : {
âbroadcastUpdatesâ : false,
âbroadcastDisputesâ : true,
ânotifySuspectâ : false,
âgossipIntervalâ : âPT0.25Sâ,
âgossipFanoutâ : 2,
âprobeIntervalâ : âPT1Sâ,
âprobeTimeoutâ : âPT2Sâ,
âsuspectProbesâ : 3,
âfailureTimeoutâ : âPT10Sâ,
âsyncIntervalâ : âPT10Sâ
}
},
âthreadsâ : {
âmanagementThreadsâ : 1
},
âmonitoringâ : {
âenabledâ : false,
âhostâ : â0.0.0.0â,
âportâ : 9600
},
âsecurityâ : {
âenabledâ : false,
âcertificateChainPathâ : null,
âprivateKeyPathâ : null
},
âlongPollingâ : {
âenabledâ : true
},
âinitializedâ : true,
âenableâ : false
},
âbackpressureâ : {
âenabledâ : true,
âalgorithmâ : âVEGASâ,
âaimdâ : {
ârequestTimeoutâ : âPT1Sâ,
âinitialLimitâ : 100,
âminLimitâ : 1,
âmaxLimitâ : 1000,
âbackoffRatioâ : 0.9
},
âfixedâ : {
âlimitâ : 20
},
âvegasâ : {
âalphaâ : 3,
âbetaâ : 6,
âinitialLimitâ : 20
},
âgradientâ : {
âminLimitâ : 10,
âinitialLimitâ : 20,
ârttToleranceâ : 2.0
},
âgradient2â : {
âminLimitâ : 10,
âinitialLimitâ : 20,
ârttToleranceâ : 2.0,
âlongWindowâ : 600
}
},
âexperimentalâ : {
âmaxAppendsPerFollowerâ : 2,
âmaxAppendBatchSizeâ : â0MBâ,
âdisableExplicitRaftFlushâ : false,
âmaxAppendBatchSizeInBytesâ : 32768
},
âstepTimeoutâ : âPT5Mâ,
âexecutionMetricsExporterEnabledâ : false
}
2021-01-26 04:34:04.385 [] [main] INFO io.zeebe.broker.system - Bootstrap Broker-0 [1/13]: actor scheduler
2021-01-26 04:34:04.422 [] [main] DEBUG io.zeebe.broker.system - Bootstrap Broker-0 [1/13]: actor scheduler started in 36 ms
2021-01-26 04:34:04.423 [] [main] INFO io.zeebe.broker.system - Bootstrap Broker-0 [2/13]: membership and replication protocol
2021-01-26 04:34:04.490 [] [main] DEBUG io.zeebe.broker.clustering - Member 0 will contact node: zeebe-broker-0.zeebe-broker.default.svc.cluster.local:26502
2021-01-26 04:34:04.497 [] [main] DEBUG io.zeebe.broker.clustering - Member 0 will contact node: zeebe-broker-1.zeebe-broker.default.svc.cluster.local:26502
2021-01-26 04:34:04.500 [] [main] DEBUG io.zeebe.broker.clustering - Member 0 will contact node: zeebe-broker-2.zeebe-broker.default.svc.cluster.local:26502
2021-01-26 04:34:04.985 [] [main] DEBUG io.zeebe.broker.system - Bootstrap Broker-0 [2/13]: membership and replication protocol started in 562 ms
2021-01-26 04:34:04.986 [] [main] INFO io.zeebe.broker.system - Bootstrap Broker-0 [3/13]: command api transport
2021-01-26 04:34:05.391 [] [main] DEBUG io.zeebe.broker.system - Bound command API to zeebe-broker-0.zeebe-broker.default.svc.cluster.local:26501
2021-01-26 04:34:05.402 [] [main] DEBUG io.zeebe.broker.system - Bootstrap Broker-0 [3/13]: command api transport started in 416 ms
2021-01-26 04:34:05.402 [] [main] INFO io.zeebe.broker.system - Bootstrap Broker-0 [4/13]: command api handler
2021-01-26 04:34:05.505 [] [main] DEBUG io.zeebe.broker.system - Bootstrap Broker-0 [4/13]: command api handler started in 103 ms
2021-01-26 04:34:05.505 [] [main] INFO io.zeebe.broker.system - Bootstrap Broker-0 [5/13]: subscription api
2021-01-26 04:34:05.594 [] [main] DEBUG io.zeebe.broker.system - Bootstrap Broker-0 [5/13]: subscription api started in 88 ms
2021-01-26 04:34:05.594 [] [main] INFO io.zeebe.broker.system - Bootstrap Broker-0 [6/13]: cluster services
2021-01-26 04:34:08.379 [] [http-nio-0.0.0.0-9600-exec-1] INFO org.apache.catalina.core.ContainerBase.[Tomcat].[localhost].[/] - Initializing Spring DispatcherServlet âdispatcherServletâ
2021-01-26 04:34:08.379 [] [http-nio-0.0.0.0-9600-exec-1] INFO org.springframework.web.servlet.DispatcherServlet - Initializing Servlet âdispatcherServletâ
2021-01-26 04:34:08.388 [] [http-nio-0.0.0.0-9600-exec-1] INFO org.springframework.web.servlet.DispatcherServlet - Completed initialization in 9 ms
After this point warnings from Raft-server starts and broker never catches up. The warnings are like:
2021-01-26 04:34:10.882 [] [raft-server-0-raft-partition-partition-7] WARN io.atomix.raft.roles.FollowerRole - RaftServer{raft-partition-partition-7}{role=FOLLOWER} - Poll request to 1 failed: java.net.ConnectException: Expected to send a message with subject âraft-partition-partition-7-pollâ to member â1â, but member is not known. Known members are â[Member{id=zeebe-broker-gateway-b759d674d-p4nl4, address=10.233.91.226:26502, properties={event-service-topics-subscribed=Af8fAQEDAWpvYnNBdmFpbGFibOU=}}, Member{id=0, address=zeebe-broker-0.zeebe-broker.default.svc.cluster.local:26502, properties={}}]â.
2021-01-26 04:34:10.882 [] [raft-server-0-raft-partition-partition-7] WARN io.atomix.raft.roles.FollowerRole - RaftServer{raft-partition-partition-7}{role=FOLLOWER} - Poll request to 2 failed: java.net.ConnectException: Expected to send a message with subject âraft-partition-partition-7-pollâ to member â2â, but member is not known. Known members are â[Member{id=zeebe-broker-gateway-b759d674d-p4nl4, address=10.233.91.226:26502, properties={event-service-topics-subscribed=Af8fAQEDAWpvYnNBdmFpbGFibOU=}}, Member{id=0, address=zeebe-broker-0.zeebe-broker.default.svc.cluster.local:26502, properties={}}]â.
2021-01-26 04:34:11.298 [] [raft-server-0-raft-partition-partition-6] WARN io.atomix.raft.roles.FollowerRole - RaftServer{raft-partition-partition-6}{role=FOLLOWER} - Poll request to 1 failed: java.net.ConnectException: Expected to send a message with subject âraft-partition-partition-6-pollâ to member â1â, but member is not known. Known members are â[Member{id=zeebe-broker-gateway-b759d674d-p4nl4, address=10.233.91.226:26502, properties={event-service-topics-subscribed=Af8fAQEDAWpvYnNBdmFpbGFibOU=}}, Member{id=0, address=zeebe-broker-0.zeebe-broker.default.svc.cluster.local:26502, properties={}}]â.
Some more information from statefulset, if it helps:
replicas: 3
- name: JAVA_TOOL_OPTIONS
value: -Xms4g -Xmx6g -XX:MaxRAMPercentage=25.0 -XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=/usr/local/zeebe/data -XX:ErrorFile=/usr/local/zeebe/data/zeebe_error%p.log
-XX:+ExitOnOutOfMemoryError
Is there anything I am missing here to help increase the partitions?
Thanks