Automatic system outage

kevin2020 · February 19, 2021, 1:04am

zeebe cluster Automatic system outage and no exception log，Is there any way to avoid this event, or to troubleshoot why it shuts down

Zelldon · February 19, 2021, 7:30am

Hey @kevin2020

without more information it is hard to help you here, so please answer the following questions:

Which version are you using?
How does your deployment look like? Are you using docker, kubernetes or anything else?
How does your configuration look like? Show us how you configured zeebe.
Can you see any resource consumption problems, like out of memory etc? Maybe via metrics or logging on os level

Greets
Chris

kevin2020 · March 15, 2021, 8:27am

1.I am using zeebe 0.25.3
2.deployment using VM machine，CPU is 8 core and Memory is 16GB
3.zeebe config

zeebe:
  broker:
    gateway:
      # Enable the embedded gateway to start on broker startup.
      # This setting can also be overridden using the environment variable ZEEBE_BROKER_GATEWAY_ENABLE.
      enable: true

      network:
        # Sets the port the embedded gateway binds to.
        # This setting can also be overridden using the environment variable ZEEBE_BROKER_GATEWAY_NETWORK_PORT.
        port: 26500
        commandApi:

      security:
        # Enables TLS authentication between clients and the gateway
        # This setting can also be overridden using the environment variable ZEEBE_BROKER_GATEWAY_SECURITY_ENABLED.
        enabled: false

    network:
      # Controls the default host the broker should bind to. Can be overwritten on a
      # per binding basis for client, management and replication
      # This setting can also be overridden using the environment variable ZEEBE_BROKER_NETWORK_HOST.
      host: 10.18.58.239

    data:
      # Specify a list of directories in which data is stored.
      # This setting can also be overridden using the environment variable ZEEBE_BROKER_DATA_DIRECTORIES.
      directories: [ data ]
      # The size of data log segment files.
      # This setting can also be overridden using the environment variable ZEEBE_BROKER_DATA_LOGSEGMENTSIZE.
      logSegmentSize: 512MB
      # How often we take snapshots of streams (time unit)
      # This setting can also be overridden using the environment variable ZEEBE_BROKER_DATA_SNAPSHOTPERIOD.
      snapshotPeriod: 15m
      useMmap: true

    cluster:
      nodeId: 3
      # Specifies the Zeebe cluster size.
      # This can also be overridden using the environment variable ZEEBE_BROKER_CLUSTER_CLUSTERSIZE.
      clusterSize: 5
      # Controls the replication factor, which defines the count of replicas per partition.
      # This can also be overridden using the environment variable ZEEBE_BROKER_CLUSTER_REPLICATIONFACTOR.
      replicationFactor: 2
      # Controls the number of partitions, which should exist in the cluster.
      # This can also be overridden using the environment variable ZEEBE_BROKER_CLUSTER_PARTITIONSCOUNT.
      partitionsCount: 10
      initialContactPoints : [ 10.18.58.236:26502, 10.18.58.237:26502, 10.18.58.238:26502, 10.18.58.239:26502, 10.18.64.239:26502 ]

    backpressure:
      enabled : false

    threads:
      # Controls the number of non-blocking CPU threads to be used.
      # WARNING: You should never specify a value that is larger than the number of physical cores
      # available. Good practice is to leave 1-2 cores for ioThreads and the operating
      # system (it has to run somewhere). For example, when running Zeebe on a machine
      # which has 4 cores, a good value would be 2.
      # This setting can also be overridden using the environment variable ZEEBE_BROKER_THREADS_CPUTHREADCOUNT
      cpuThreadCount: 4
      # Controls the number of io threads to be used.
      # This setting can also be overridden using the environment variable ZEEBE_BROKER_THREADS_IOTHREADCOUNT
      ioThreadCount: 4

    exporters:
      elasticsearch:
        className: io.zeebe.exporter.ElasticsearchExporter
        args:
          url: http://10.17.43.113:19200

4.The monitoring is under construction, and no abnormalities can be seen from the operating system
5.Restart the application and continue the service, but I don’t know when it will stop automatically

kevin2020 · March 29, 2021, 5:39am

I found the reason why the process is automatically killed. OOM will automatically kill the thread due to high memory usage in Linux, but how much memory should be allocated by zeebe?

jwulf · March 30, 2021, 1:01am

The short answer is more. Depending on what you are doing - volume, snapshot timing - the memory requirement will be different. Last I checked it, rebuilding a broker state on restart required more memory than running it.

The best way to understand the memory requirement in your scenario is to run your scenario and profile the memory usage. This will quite possibly change between versions, so I would do the profiling every time you look at upgrading versions.

system · January 31, 2024, 10:08am