Zeebe broker data keep increasing

regojoyson · July 2, 2019, 8:44am

I am using zeebe broker 0.18.0 . From last week I deployed 20 workflows and 70 instances in zeebe. Now the space is used by the zeebe is 25GB. I think its so huge.

Any suggestion Please to reduce the size??

972K    ./system/partitions/1
972K    ./system/partitions
972K    ./system
1.4M    ./raft-atomix/partitions/1
1.4M    ./raft-atomix/partitions
1.4M    ./raft-atomix
23G     ./partition-1/segments
276K    ./partition-1/state/1_zb-stream-processor/snapshots/635655438512
176K    ./partition-1/state/1_zb-stream-processor/snapshots/3702262016456
136K    ./partition-1/state/1_zb-stream-processor/snapshots/4226248426360
308K    ./partition-1/state/1_zb-stream-processor/snapshots/7426000341760
288K    ./partition-1/state/1_zb-stream-processor/snapshots/10900627979608
280K    ./partition-1/state/1_zb-stream-processor/snapshots/14177687737568
252K    ./partition-1/state/1_zb-stream-processor/snapshots/17454748594608
248K    ./partition-1/state/1_zb-stream-processor/snapshots/20718924346280
252K    ./partition-1/state/1_zb-stream-processor/snapshots/24008867982944
244K    ./partition-1/state/1_zb-stream-processor/snapshots/27303107643368
264K    ./partition-1/state/1_zb-stream-processor/snapshots/30593052336800
264K    ./partition-1/state/1_zb-stream-processor/snapshots/33878702646760
284K    ./partition-1/state/1_zb-stream-processor/snapshots/37168648634400
3.2M    ./partition-1/state/1_zb-stream-processor/snapshots
259M    ./partition-1/state/1_zb-stream-processor/runtime
262M    ./partition-1/state/1_zb-stream-processor
262M    ./partition-1/state
26M     ./partition-1/index/snapshots/38079182387584
26M     ./partition-1/index/snapshots
2.1G    ./partition-1/index/runtime
2.1G    ./partition-1/index
25G     ./partition-1

jwulf · July 2, 2019, 1:57pm

Interesting problem. It’s hard to say what is happening with just this to go on. I would not expect this small number of workflows/instances to use this much space, unless they are accumulating a lot of state.

How are you starting the broker?

Do you have any exporters loaded?

What is the broker config?

What messages / errors are there in the log? At startup? Ongoing?

regojoyson · July 3, 2019, 4:48am

Hey @jwulf,

I am starting broker using docker compose.
yes i am using elastic exporter
i have shared the file below
No errors while starting but if the disk is full then it says out of disk space error

zeebe-config :

  # Zeebe broker configuration file

# Overview -------------------------------------------

# This file contains a complete list of available configuration options.

# Default values:
#
# When the default value is used for a configuration option, the option is
# commented out. You can learn the default value from this file

# Conventions:
#
# Byte sizes
# For buffers and others must be specified as strings and follow the following
# format: "10U" where U (unit) must be replaced with K = Kilobytes, M = Megabytes or G = Gigabytes.
# If unit is omitted then the default unit is simply bytes.
# Example:
# sendBufferSize = "16M" (creates a buffer of 16 Megabytes)
#
# Time units
# Timeouts, intervals, and the likes, must be specified as strings and follow the following
# format: "VU", where:
#   - V is a numerical value (e.g. 1, 1.2, 3.56, etc.)
#   - U is the unit, one of: ms = Millis, s = Seconds, m = Minutes, or h = Hours
#
# Paths:
# Relative paths are resolved relative to the installation directory of the
# broker.

# ----------------------------------------------------


[gateway]
# Enable the embedded gateway to start on broker startup.
# This setting can also be overridden using the environment variable ZEEBE_EMBED_GATEWAY.
# enable = true

[gateway.network]
# Sets the host the embedded gateway binds to.
# This setting can be specified using the following precedence:
# 1. setting the environment variable ZEEBE_GATEWAY_HOST
# 2. setting gateway.network.host property in this file
# 3. setting the environment variable ZEEBE__HOST
# 4. setting network.host property in this file
# host = "0.0.0.0"

# Sets the port the embedded gateway binds to.
# This setting can also be overridden using the environment variable ZEEBE_GATEWAY_PORT.
# port = 26500

[gateway.cluster]
# Sets the broker the gateway should initial contact.
# This setting can also be overridden using the environment variable ZEEBE_GATEWAY_CONTACT_POINT.
# contactPoint = "127.0.0.1:26501"

# Sets size of the transport buffer to send and received messages between gateway and broker cluster.
# This setting can also be overridden using the environment variable ZEEBE_GATEWAY_TRANSPORT_BUFFER.
# transportBuffer = "128M"

# Sets the timeout of requests send to the broker cluster
# This setting can also be overridden using the environment variable ZEEBE_GATEWAY_REQUEST_TIMEOUT.
# requestTimeout = "15s"

[gateway.threads]
# Sets the number of threads the gateway will use to communicate with the broker cluster
# This setting can also be overridden using the environment variable ZEEBE_GATEWAY_MANAGEMENT_THREADS.
# managementThreads = 1

[network]

# This section contains the network configuration. Particularly, it allows to
# configure the hosts and ports the broker should bind to. The broker exposes two sockets:
# 1. command: the socket which is used for gateway-to-broker communication
# 2. internal: the socket which is used for broker-to-broker communication

# Controls the default host the broker should bind to. Can be overwritten on a
# per binding basis for client, management and replication
#
# This setting can also be overridden using the environment variable ZEEBE_HOST.
# host = "0.0.0.0"

# If a port offset is set it will be added to all ports specified in the config
# or the default values. This is a shortcut to not always specifying every port.
#
# The offset will be added to the second last position of the port, as Zeebe
# requires multiple ports. As example a portOffset of 5 will increment all ports
# by 50, i.e. 26500 will become 26550 and so on.
#
# This setting can also be overridden using the environment variable ZEEBE_PORT_OFFSET.
# portOffset = 0

[network.commandApi]
# Overrides the host used for gateway-to-broker communication
# host = "localhost"

# Sets the port used for gateway-to-broker communication
# port = 26501

# Sets the size of the buffer used for buffering outgoing messages
# sendBufferSize = "16M"

[network.internalApi]
# Overrides the host used for internal broker-to-broker communication
# host = "localhost"

# Sets the port used for internal broker-to-broker communication
# port = 26502


[data]

# This section allows to configure Zeebe's data storage. Data is stored in
# "partition folders". A partition folder has the following structure:
#
# partition-0                       (root partition folder)
# ├── partition.json                (metadata about the partition)
# ├── segments                      (the actual data as segment files)
# │   ├── 00.data
# │   └── 01.data
# ├── index                             (log block index state and snapshots)
# │   ├── runtime
# │   └── snapshots
# └── state                             (stream processor state and snapshots)
#     └── stream-processor
#                 ├── runtime
#                 └── snapshots

# Specify a list of directories in which data is stored. Using multiple
# directories makes sense in case the machine which is running Zeebe has
# multiple disks which are used in a JBOD (just a bunch of disks) manner. This
# allows to get greater throughput in combination with a higher io thread count
# since writes to different disks can potentially be done in parallel.
#
# This setting can also be overridden using the environment variable ZEEBE_DIRECTORIES.
# directories = [ "data" ]

# The size of data log segment files.
# logSegmentSize = "512M"

# The size of block index segments.
# indexBlockSize = "4M"

# How often we take snapshots of streams (time unit)
snapshotPeriod = "12h"

# The maximum number of snapshots kept (must be a positive integer). When this
# limit is passed the oldest snapshot is deleted.
maxSnapshots = "14"

# How often follower partitions will check for new snapshots to replicate from
# the leader partitions. Snapshot replication enables faster failover by
# reducing how many log entries must be reprocessed in case of leader change.
# snapshotReplicationPeriod = "5m"


[cluster]

# This section contains all cluster related configurations, to setup an zeebe cluster

# Specifies the unique id of this broker node in a cluster.
# The id should be between 0 and number of nodes in the cluster (exclusive).
#
# This setting can also be overridden using the environment variable ZEEBE_NODE_ID.
# nodeId = 0

# Controls the number of partitions, which should exist in the cluster.
#
# This can also be overridden using the environment variable ZEEBE_PARTITIONS_COUNT.
# partitionsCount = 1

# Controls the replication factor, which defines the count of replicas per partition.
# The replication factor cannot be greater than the number of nodes in the cluster.
#
# This can also be overridden using the environment variable ZEEBE_REPLICATION_FACTOR.
# replicationFactor = 1

# Specifies the zeebe cluster size. This value is used to determine which broker
# is responsible for which partition.
#
# This can also be overridden using the environment variable ZEEBE_CLUSTER_SIZE.
# clusterSize = 1

# Allows to specify a list of known other nodes to connect to on startup
# The contact points of the internal network configuration must be specified.
# The format is [HOST:PORT]
# Example:
# initialContactPoints = [ "192.168.1.22:26502", "192.168.1.32:26502" ]
#
# This setting can also be overridden using the environment variable ZEEBE_CONTACT_POINTS
# specifying a comma-separated list of contact points.
#
# Default is empty list:
# initialContactPoints = []

# Allows to specify a name for the cluster
# This setting can also be overridden using the environment variable ZEEBE_CLUSTER_NAME
# Example:
# clusterName = "zeebe-cluster"

[threads]

# Controls the number of non-blocking CPU threads to be used. WARNING: You
# should never specify a value that is larger than the number of physical cores
# available. Good practice is to leave 1-2 cores for ioThreads and the operating
# system (it has to run somewhere). For example, when running Zeebe on a machine
# which has 4 cores, a good value would be 2.
#
# The default value is 2.
#cpuThreadCount = 2

# Controls the number of io threads to be used. These threads are used for
# workloads that write data to disk. While writing, these threads are blocked
# which means that they yield the CPU.
#
# The default value is 2.
#ioThreadCount = 2

[metrics]

# Path to the file to which metrics are written. Metrics are written in a
# text-based format understood by prometheus.io
# metricsFile = "metrics/zeebe.prom"

# Controls the interval at which the metrics are written to the metrics file
# reportingInterval = "5s"

# Controls if the prometheus metrics should be exporter over HTTP
# This setting can also be overridden using the environment variable ZEEBE_METRICS_HTTP_SERVER.
# enableHttpServer = false

# Host to export metrics on, defaults to network.host
# host = "0.0.0.0"

# Port to export metrics on
# port = 9600

# Configure exporters below; note that configuration parsing conventions do not apply to exporter
# arguments, which will be parsed as normal TOML.
#
# Each exporter should be configured following this template:
#
# id:
#   property should be unique in this configuration file, as it will server as the exporter
#   ID for loading/unloading.
# jarPath:
#   path to the JAR file containing the exporter class. JARs are only loaded once, so you can define
#   two exporters that point to the same JAR, with the same class or a different one, and use args
#   to parametrize its instantiation.
# className:
#   entry point of the exporter, a class which *must* extend the io.zeebe.exporter.Exporter
#   interface.
#
# A nested table as [exporters.args] will allow you to inject arbitrary arguments into your
# class through the use of annotations.
#
# Enable the following debug exporter to log the exported records to console
# This exporter can also be enabled using the environment variable ZEEBE_DEBUG, the pretty print
# option will be enabled if the variable is set to "pretty".
#
# [[exporters]]
# id = "debug-log"
# className = "io.zeebe.broker.exporter.debug.DebugLogExporter"
# [exporters.args]
#   logLevel = "debug"
#   prettyPrint = false
#
# Enable the following debug exporter to start a http server to inspect the exported records
#
# [[exporters]]
# id = "debug-http"
# className = "io.zeebe.broker.exporter.debug.DebugHttpExporter"
# [exporters.args]
#   port = 8000
#   limit = 1024
#
#
# An example configuration for the elasticsearch exporter:
#
#[[exporters]]
#id = "elasticsearch"
#className = "io.zeebe.exporter.ElasticsearchExporter"
#
#  [exporters.args]
#  url = "http://localhost:9200"
#
#  [exporters.args.bulk]
#  delay = 5
#  size = 1_000
#
#  [exporters.args.authentication]
#  username = elastic
#  password = changeme
#
#  [exporters.args.index]
#  prefix = "zeebe-record"
#  createTemplate = true
#
#  command = false
#  event = true
#  rejection = false
#
#  deployment = true
#  incident = true
#  job = true
#  message = false
#  messageSubscription = false
#  raft = false
#  workflowInstance = true
#  workflowInstanceSubscription = false

[[exporters]]
id = "elasticsearch"
className = "io.zeebe.exporter.ElasticsearchExporter"

[exporters.args]
url = "http://zeebe-elastic:9200"

[exporters.args.bulk]
delay = 5
size = 1_000

[exporters.args.index]
prefix = "zeebe-record"
createTemplate = true

command = false
event = true
rejection = false

deployment = true
incident = true
job = true
message = false
messageSubscription = false
raft = false
workflowInstance = true
workflowInstanceSubscription = false

docker-compose :

 version: "2"

networks:
  zeebe_network:

volumes:
  zeebe_elasticsearch_data:

services:
  zeebe:
image: camunda/zeebe:0.18.0
ports:
  - "26500:26500"
volumes:
  - /opt/data:/usr/local/zeebe/data
  - ./zeebe.cfg.toml:/usr/local/zeebe/conf/zeebe.cfg.toml
depends_on:
  - elasticsearch
networks:
  - zeebe_network
  elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch-oss:6.7.1
ports:
  - "9200:9200"
environment:
  - discovery.type=single-node
  - cluster.name=elasticsearch
  - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
volumes:
  - zeebe_elasticsearch_data:/usr/share/elasticsearch/data
networks:
  - zeebe_network
  kibana:
image: docker.elastic.co/kibana/kibana-oss:6.7.1
ports:
  - "5601:5601"
networks:
  - zeebe_network

starting logs :

 7/3/2019 10:00:20 AM+ configFile=/usr/local/zeebe/conf/zeebe.cfg.toml
7/3/2019 10:00:20 AM++ hostname -i
7/3/2019 10:00:20 AM+ export ZEEBE_HOST=10.42.103.202
7/3/2019 10:00:20 AM+ ZEEBE_HOST=10.42.103.202
7/3/2019 10:00:20 AM+ export ZEEBE_GATEWAY_CLUSTER_HOST=10.42.103.202
7/3/2019 10:00:20 AM+ ZEEBE_GATEWAY_CLUSTER_HOST=10.42.103.202
7/3/2019 10:00:20 AM+ '[' false = true ']'
7/3/2019 10:00:20 AM+ exec /usr/local/zeebe/bin/broker
7/3/2019 10:00:21 AM04:30:21.827 [] [main] INFO  io.zeebe.util.config - Reading configuration for class class io.zeebe.broker.system.configuration.BrokerCfg from file /usr/local/zeebe/conf/zeebe.cfg.toml
7/3/2019 10:00:22 AM04:30:22.023 [] [main] INFO  io.zeebe.broker.system - Scheduler configuration: Threads{cpu-bound: 2, io-bound: 2}.
7/3/2019 10:00:22 AM04:30:22.070 [] [main] INFO  io.zeebe.broker.system - Version: 0.18.0
7/3/2019 10:00:22 AM04:30:22.082 [] [main] INFO  io.zeebe.broker.system - Starting broker with configuration {
7/3/2019 10:00:22 AM  "network": {
7/3/2019 10:00:22 AM    "host": "10.42.103.202",
7/3/2019 10:00:22 AM    "portOffset": 0,
7/3/2019 10:00:22 AM    "commandApi": {
7/3/2019 10:00:22 AM      "host": "10.42.103.202",
7/3/2019 10:00:22 AM      "port": 26501,
7/3/2019 10:00:22 AM      "sendBufferSize": "16M"
7/3/2019 10:00:22 AM    },
7/3/2019 10:00:22 AM    "internalApi": {
7/3/2019 10:00:22 AM      "host": "10.42.103.202",
7/3/2019 10:00:22 AM      "port": 26502,
7/3/2019 10:00:22 AM      "sendBufferSize": "16M"
7/3/2019 10:00:22 AM    }
7/3/2019 10:00:22 AM  },
7/3/2019 10:00:22 AM  "cluster": {
7/3/2019 10:00:22 AM    "initialContactPoints": [],
7/3/2019 10:00:22 AM    "partitionIds": [
7/3/2019 10:00:22 AM      1
7/3/2019 10:00:22 AM    ],
7/3/2019 10:00:22 AM    "nodeId": 0,
7/3/2019 10:00:22 AM    "partitionsCount": 1,
7/3/2019 10:00:22 AM    "replicationFactor": 1,
7/3/2019 10:00:22 AM    "clusterSize": 1,
7/3/2019 10:00:22 AM    "clusterName": "zeebe-cluster"
7/3/2019 10:00:22 AM  },
7/3/2019 10:00:22 AM  "threads": {
7/3/2019 10:00:22 AM    "cpuThreadCount": 2,
7/3/2019 10:00:22 AM    "ioThreadCount": 2
7/3/2019 10:00:22 AM  },
7/3/2019 10:00:22 AM  "metrics": {
7/3/2019 10:00:22 AM    "reportingInterval": "5s",
7/3/2019 10:00:22 AM    "file": "/usr/local/zeebe/metrics/zeebe.prom",
7/3/2019 10:00:22 AM    "enableHttpServer": false,
7/3/2019 10:00:22 AM    "host": "10.42.103.202",
7/3/2019 10:00:22 AM    "port": 9600
7/3/2019 10:00:22 AM  },
7/3/2019 10:00:22 AM  "data": {
7/3/2019 10:00:22 AM    "directories": [
7/3/2019 10:00:22 AM      "/usr/local/zeebe/data"
7/3/2019 10:00:22 AM    ],
7/3/2019 10:00:22 AM    "logSegmentSize": "512M",
7/3/2019 10:00:22 AM    "indexBlockSize": "4M",
7/3/2019 10:00:22 AM    "snapshotPeriod": "12h",
7/3/2019 10:00:22 AM    "snapshotReplicationPeriod": "5m",
7/3/2019 10:00:22 AM    "maxSnapshots": 14
7/3/2019 10:00:22 AM  },
7/3/2019 10:00:22 AM  "exporters": [
7/3/2019 10:00:22 AM    {
7/3/2019 10:00:22 AM      "id": "elasticsearch",
7/3/2019 10:00:22 AM      "className": "io.zeebe.exporter.ElasticsearchExporter",
7/3/2019 10:00:22 AM      "args": {
7/3/2019 10:00:22 AM        "index": {
7/3/2019 10:00:22 AM          "workflowInstance": true,
7/3/2019 10:00:22 AM          "messageSubscription": false,
7/3/2019 10:00:22 AM          "prefix": "zeebe-record",
7/3/2019 10:00:22 AM          "raft": false,
7/3/2019 10:00:22 AM          "message": false,
7/3/2019 10:00:22 AM          "createTemplate": true,
7/3/2019 10:00:22 AM          "command": false,
7/3/2019 10:00:22 AM          "rejection": false,
7/3/2019 10:00:22 AM          "workflowInstanceSubscription": false,
7/3/2019 10:00:22 AM          "event": true,
7/3/2019 10:00:22 AM          "job": true,
7/3/2019 10:00:22 AM          "incident": true,
7/3/2019 10:00:22 AM          "deployment": true
7/3/2019 10:00:22 AM        },
7/3/2019 10:00:22 AM        "bulk": {
7/3/2019 10:00:22 AM          "delay": 5.0,
7/3/2019 10:00:22 AM          "size": 1000.0
7/3/2019 10:00:22 AM        },
7/3/2019 10:00:22 AM        "url": "http://zeebe-elastic:9200"
7/3/2019 10:00:22 AM      }
7/3/2019 10:00:22 AM    }
7/3/2019 10:00:22 AM  ],
7/3/2019 10:00:22 AM  "gateway": {
7/3/2019 10:00:22 AM    "enable": true,
7/3/2019 10:00:22 AM    "network": {
7/3/2019 10:00:22 AM      "host": "0.0.0.0",
7/3/2019 10:00:22 AM      "port": 26500
7/3/2019 10:00:22 AM    },
7/3/2019 10:00:22 AM    "cluster": {
7/3/2019 10:00:22 AM      "contactPoint": "10.42.103.202:26502",
7/3/2019 10:00:22 AM      "transportBuffer": "128M",
7/3/2019 10:00:22 AM      "requestTimeout": "15s",
7/3/2019 10:00:22 AM      "clusterName": "zeebe-cluster",
7/3/2019 10:00:22 AM      "memberId": "gateway",
7/3/2019 10:00:22 AM      "host": "10.42.103.202",
7/3/2019 10:00:22 AM      "port": 26502
7/3/2019 10:00:22 AM    },
7/3/2019 10:00:22 AM    "threads": {
7/3/2019 10:00:22 AM      "managementThreads": 1
7/3/2019 10:00:22 AM    }
7/3/2019 10:00:22 AM  }
7/3/2019 10:00:22 AM}
7/3/2019 10:00:22 AM04:30:22.223 [service-controller] [10.42.103.202:26501-zb-actors-1] INFO  io.zeebe.transport - Bound commandApi.server to /10.42.103.202:26501
7/3/2019 10:00:24 AM04:30:24.690 [] [zb-blocking-task-runner-3-10.42.103.202:26501] INFO  io.zeebe.gateway - Version: 0.18.0
7/3/2019 10:00:24 AM04:30:24.691 [] [zb-blocking-task-runner-3-10.42.103.202:26501] INFO  io.zeebe.gateway - Starting gateway with configuration {
7/3/2019 10:00:24 AM  "enable": true,
7/3/2019 10:00:24 AM  "network": {
7/3/2019 10:00:24 AM    "host": "0.0.0.0",
7/3/2019 10:00:24 AM    "port": 26500
7/3/2019 10:00:24 AM  },
7/3/2019 10:00:24 AM  "cluster": {
7/3/2019 10:00:24 AM    "contactPoint": "10.42.103.202:26502",
7/3/2019 10:00:24 AM    "transportBuffer": "128M",
7/3/2019 10:00:24 AM    "requestTimeout": "15s",
7/3/2019 10:00:24 AM    "clusterName": "zeebe-cluster",
7/3/2019 10:00:24 AM    "memberId": "gateway",
7/3/2019 10:00:24 AM    "host": "10.42.103.202",
7/3/2019 10:00:24 AM    "port": 26502
7/3/2019 10:00:24 AM  },
7/3/2019 10:00:24 AM  "threads": {
7/3/2019 10:00:24 AM    "managementThreads": 1
7/3/2019 10:00:24 AM  }
7/3/2019 10:00:24 AM}
7/3/2019 10:00:24 AM04:30:24.908 [service-controller] [10.42.103.202:26501-zb-actors-1] INFO  io.atomix.core.Atomix - 3.2.0-alpha1 (revision 44a5f9 built on 2019-06-06 16:37:07)
7/3/2019 10:00:25 AM04:30:25.311 [] [netty-messaging-event-epoll-server-0] INFO  io.atomix.cluster.messaging.impl.NettyMessagingService - TCP server listening for connections on 0.0.0.0:26502
7/3/2019 10:00:25 AM04:30:25.323 [] [netty-messaging-event-epoll-server-0] INFO  io.atomix.cluster.messaging.impl.NettyMessagingService - Started
7/3/2019 10:00:25 AM04:30:25.398 [] [netty-unicast-event-nio-client-0] INFO  io.atomix.cluster.messaging.impl.NettyUnicastService - UDP server listening for connections on 0.0.0.0:26502
7/3/2019 10:00:25 AM04:30:25.400 [] [atomix-cluster-0] INFO  io.atomix.cluster.discovery.BootstrapDiscoveryProvider - Joined
7/3/2019 10:00:25 AM04:30:25.404 [] [atomix-cluster-0] INFO  io.atomix.cluster.protocol.SwimMembershipProtocol - 0 - Member activated: Member{id=0, address=10.42.103.202:26502, properties={brokerInfo={"nodeId":0,"partitionsCount":1,"clusterSize":1,"replicationFactor":1,"addresses":{"command":"10.42.103.202:26501"},"partitionRoles":{}}}}
7/3/2019 10:00:25 AM04:30:25.414 [] [atomix-cluster-0] INFO  io.atomix.cluster.protocol.SwimMembershipProtocol - Started
7/3/2019 10:00:25 AM04:30:25.415 [] [atomix-cluster-0] INFO  io.atomix.cluster.impl.DefaultClusterMembershipService - Started
7/3/2019 10:00:25 AM04:30:25.415 [] [atomix-cluster-0] INFO  io.atomix.cluster.messaging.impl.DefaultClusterCommunicationService - Started
7/3/2019 10:00:25 AM04:30:25.432 [] [atomix-cluster-0] INFO  io.atomix.cluster.messaging.impl.DefaultClusterEventService - Started
7/3/2019 10:00:25 AM04:30:25.452 [] [atomix-0] INFO  io.atomix.primitive.partition.impl.DefaultPartitionGroupMembershipService - Started
7/3/2019 10:00:25 AM04:30:25.491 [] [atomix-0] INFO  io.atomix.primitive.partition.impl.HashBasedPrimaryElectionService - Started
7/3/2019 10:00:25 AM04:30:25.518 [io.zeebe.gateway.impl.broker.cluster.BrokerTopologyManagerImpl] [10.42.103.202:26501-zb-actors-0] INFO  io.zeebe.transport.endpoint - Registering endpoint for node '0' with address '10.42.103.202:26501' on transport 'gateway-broker-client'
7/3/2019 10:00:25 AM04:30:25.580 [] [atomix-0] INFO  io.atomix.protocols.raft.partition.impl.RaftPartitionServer - Starting server for partition PartitionId{id=1, group=system}
7/3/2019 10:00:25 AM04:30:25.783 [] [raft-server-system-partition-1] INFO  io.atomix.protocols.raft.impl.RaftContext - RaftServer{system-partition-1} - Transitioning to FOLLOWER
7/3/2019 10:00:25 AM04:30:25.791 [] [raft-server-system-partition-1] INFO  io.atomix.protocols.raft.impl.RaftContext - RaftServer{system-partition-1} - Transitioning to CANDIDATE
7/3/2019 10:00:25 AM04:30:25.792 [] [raft-server-system-partition-1] WARN  io.atomix.utils.event.ListenerRegistry - Listener io.atomix.protocols.raft.roles.FollowerRole$$Lambda$302/179800959@14b58141 not registered
7/3/2019 10:00:25 AM04:30:25.800 [] [raft-server-system-partition-1] INFO  io.atomix.protocols.raft.impl.RaftContext - RaftServer{system-partition-1} - Transitioning to LEADER
7/3/2019 10:00:25 AM04:30:25.814 [] [raft-server-system-partition-1] INFO  io.atomix.protocols.raft.impl.RaftContext - RaftServer{system-partition-1} - Found leader 0
7/3/2019 10:00:25 AM04:30:25.890 [] [raft-server-system-partition-1] INFO  io.atomix.protocols.raft.partition.RaftPartitionGroup - Started
7/3/2019 10:00:26 AM04:30:26.127 [] [raft-partition-group-system-3] INFO  io.atomix.protocols.raft.partition.impl.RaftPartitionServer - Starting server for partition PartitionId{id=1, group=raft-atomix}
7/3/2019 10:00:26 AM04:30:26.155 [] [raft-server-raft-atomix-partition-1] INFO  io.atomix.protocols.raft.impl.RaftContext - RaftServer{raft-atomix-partition-1} - Transitioning to FOLLOWER
7/3/2019 10:00:26 AM04:30:26.157 [] [raft-server-raft-atomix-partition-1] INFO  io.atomix.protocols.raft.impl.RaftContext - RaftServer{raft-atomix-partition-1} - Transitioning to CANDIDATE
7/3/2019 10:00:26 AM04:30:26.157 [] [raft-server-raft-atomix-partition-1] WARN  io.atomix.utils.event.ListenerRegistry - Listener io.atomix.protocols.raft.roles.FollowerRole$$Lambda$302/179800959@ba6c5c6 not registered
7/3/2019 10:00:26 AM04:30:26.161 [] [raft-server-raft-atomix-partition-1] INFO  io.atomix.protocols.raft.impl.RaftContext - RaftServer{raft-atomix-partition-1} - Transitioning to LEADER
7/3/2019 10:00:26 AM04:30:26.162 [] [raft-server-raft-atomix-partition-1] INFO  io.atomix.protocols.raft.impl.RaftContext - RaftServer{raft-atomix-partition-1} - Found leader 0
7/3/2019 10:00:26 AM04:30:26.174 [] [raft-partition-group-raft-atomix-0] INFO  io.atomix.protocols.raft.partition.RaftPartitionGroup - Started
7/3/2019 10:00:26 AM04:30:26.175 [] [raft-partition-group-raft-atomix-0] INFO  io.atomix.primitive.partition.impl.DefaultPartitionService - Started
7/3/2019 10:00:26 AM04:30:26.540 [] [raft-partition-group-system-2] INFO  io.atomix.core.impl.CoreTransactionService - Started
7/3/2019 10:00:26 AM04:30:26.541 [] [raft-partition-group-system-2] INFO  io.atomix.core.impl.CorePrimitivesService - Started
7/3/2019 10:00:26 AM04:30:26.567 [service-controller] [10.42.103.202:26501-zb-actors-1] INFO  io.zeebe.broker.clustering - Creating leader election for partition 1 in node 0
7/3/2019 10:00:26 AM04:30:26.613 [] [raft-server-raft-atomix-partition-1-state] INFO  io.zeebe.distributedlog.impl.DefaultDistributedLogstreamService-1 - Configuring distributed-log on node 0 with logName raft-atomix-partition-1
7/3/2019 10:00:27 AM04:30:27.014 [] [raft-server-raft-atomix-partition-1-state] INFO  io.zeebe.distributedlog.impl.DefaultDistributedLogstreamService-1 - Configured with LogStream raft-atomix-partition-1 and last appended event at position -1
7/3/2019 10:00:28 AM04:30:28.331 [zb-stream-processor] [10.42.103.202:26501-zb-actors-0] INFO  io.zeebe.logstreams - Recovering state of partition 1 from snapshot
7/3/2019 10:00:28 AM04:30:28.411 [exporter] [10.42.103.202:26501-zb-fs-workers-0] INFO  io.zeebe.broker.exporter - Recovering exporter 'exporter' from snapshot
7/3/2019 10:00:28 AM04:30:28.424 [exporter] [10.42.103.202:26501-zb-fs-workers-0] INFO  io.zeebe.broker.exporter - Recovered exporter 'exporter' from snapshot at lastExportedPosition -1
7/3/2019 10:00:28 AM04:30:28.440 [exporter] [10.42.103.202:26501-zb-fs-workers-0] INFO  io.zeebe.broker.exporter - Set event filter for exporters: ExporterEventFilter{acceptRecordTypes={NULL_VAL=false, SBE_UNKNOWN=false, COMMAND_REJECTION=false, EVENT=true, COMMAND=false}, acceptValueTypes={WORKFLOW_INSTANCE_SUBSCRIPTION=false, WORKFLOW_INSTANCE=true, NOOP=false, VARIABLE_DOCUMENT=false, MESSAGE=false, JOB=true, VARIABLE=true, TIMER=false, WORKFLOW_INSTANCE_CREATION=false, INCIDENT=true, ERROR=false, SBE_UNKNOWN=false, MESSAGE_START_EVENT_SUBSCRIPTION=false, DEPLOYMENT=true, JOB_BATCH=false, MESSAGE_SUBSCRIPTION=false, NULL_VAL=false, EXPORTER=false}}
7/3/2019 10:00:28 AM04:30:28.460 [zb-stream-processor] [10.42.103.202:26501-zb-actors-0] INFO  io.zeebe.logstreams - Recovered state of partition 1 from snapshot at position -1
7/3/2019 10:00:28 AM04:30:28.721 [zb-stream-processor] [10.42.103.202:26501-zb-actors-0] INFO  io.zeebe.processor - Start scanning the log for error events.
7/3/2019 10:00:28 AM04:30:28.721 [zb-stream-processor] [10.42.103.202:26501-zb-actors-0] INFO  io.zeebe.processor - Finished scanning the log for error events.
7/3/2019 10:00:29 AM04:30:29.293 [exporter] [10.42.103.202:26501-zb-fs-workers-0] INFO  io.zeebe.broker.exporter.elasticsearch - Exporter opened

Philipp_Ossler · July 3, 2019, 5:10am

Hi @regojoyson,

in your configuration, there are two settings that delay the data deletion.

# How often we take snapshots of streams (time unit)
snapshotPeriod = "12h"

# The maximum number of snapshots kept (must be a positive integer). When this
# limit is passed the oldest snapshot is deleted.
maxSnapshots = "14"

So, the broker will take snapshots every 12 hours and keep 14 of them. When it created the 15 snapshots, it deletes all data before the time when the oldest snapshot was taken. That means it stores the data for 7 days before deleting it.

You should keep this in mind. However, it is not the reason why you have no much data in the first place for just 70 workflow instances. Can you share some for theses workflows?

Best regards,
Philipp

Zelldon · July 3, 2019, 5:17am

Would also be interesting how many workers you have running and how they are configured (pollIntervall etc.). Which client do you use?

Greets
Chris

regojoyson · July 3, 2019, 6:26am

@jwulf @philipp.ossler @Zelldon

I am using java client 0.18.0(not spring zeebe client ) in latest spring boot. I have 14 job workers with default java client config.

here is one sample of workflow :

https://drive.google.com/file/d/1Q17Sbli-8nPTu55vLkIOvqfcnTAtOGpL/view?usp=sharing

here is registering the worker using java client :

@Component("checkFulfillmentStatus-Worker")
public class CheckFulfillmentStatusWorker  implements JobHandler {

	// Zeebe Client
		@Autowired
		protected ZeebeClient client;

		// Worker subscription
		protected JobWorker subscription;

		@Autowired
		protected ComponentWorkflowHandlerService componentWorkflowHandlerService;

	// Logger
	private static final Logger log = LoggerFactory.getLogger(CheckFulfillmentStatusWorker.class);

	/**
	 * Call after initiating the class
	 */
	@PostConstruct
	public void subscribe() {
		subscription = client.newWorker().jobType("checkFulfillmentStatus").handler(this).open();
	}

	/**
	 * @see JobHandler#handle(JobClient, ActivatedJob)
	 */
	@Override
	public void handle(JobClient client, ActivatedJob job) {
      componentWorkflowHandlerService.checkFulfillmentStatus(client, job);
	}

	/**
	 * Call before destroying the class
	 */
	@PreDestroy
	public void closeSubscription() {
		subscription.close();
	}

}

broker registration :

@Configuration
public class ZeebeConfig {

	@Value("${zeebe.broker.contactPoint}")
	private String brokerContactPoint;
	
	@Bean
	public ZeebeClient zeebe() {
		return ZeebeClient.newClientBuilder().brokerContactPoint(brokerContactPoint).build();
	}
}

regojoyson · July 4, 2019, 4:54am

@jwulf @philipp.ossler @Zelldon

Is this Bug Or Something change i have to do in my config…?

Philipp_Ossler · July 4, 2019, 5:28am

I guess that the job works are the reason for the huge amount of data. Currently, a job worker writes to the log when it polls. Even, if it doesn’t acquire any jobs.

By default, the Java client job worker polls the broker every 100 milliseconds. With 14 job workers, it produces a lot of data over time.

So, you should consider to:

increate the poll interval of the job workers to produce less data
decrease the snapshot period and max count to delete data

regojoyson · July 4, 2019, 5:48am

Can i delete the data from segments or is there any auto delete options to delete the data which are not going to use like instances (not workflows)?

Philipp_Ossler · July 4, 2019, 6:03am

The broker takes snapshots of the internal state from time to time, configured by the parameter snapshotPeriod. The snapshot contains all active data, for example, deployed workflow, active workflow instance, not yet completed jobs.

When the broker has as many snapshots as configured by the parameter maxSnapshots, it deletes all data on the log which was written before the oldest snapshot.

Since the active data is stored in the internal state (and snapshots for recovery), you can still continue active workflow instance or creation new instances of deployed workflows afterward.

Does this make sense for you?

regojoyson · July 4, 2019, 6:24am

Yes, it make sense to keep max snapshots less. And also i want to know where the active data is stored is it inside the segments, state or in index ? because for me segments folder only keep increasing, it will create file 00. data, 01. data of 512MB. Is it active data or recovery Snapshots ?

Philipp_Ossler · July 4, 2019, 6:52am

segments - the data of the log split into segments. The log is only appended - until it is truncated by reaching the max snapshot count.
state - the active state. Deployed workflows, active workflow instances, etc. Completed workflow instances or jobs are removed.
snapshot - the data of a state at a certain point in time
index - is removed in further versions

regojoyson · July 4, 2019, 6:57am

Thank you so much @philipp.ossler.

klaus.nji · October 3, 2019, 2:54pm

Very good information here. Would you guys be able to add to Zeebe’s documentation its data retention and management policy? From my limited knowledge so far (and please correct me where I am wrong), Zeebe stores all state in RocksDb. Clustering and periodic snapshots allows Zeebe to be able to recover from a node disaster. Exporters provide a mechanism for Zeebe to export state data to external persistent storage for analysis and reporting purposes. However, exported data cannot be used to rehydrate or restarted a failed cluster.

There questions such as:

Can the snapshots be used for rehydration?
How often are exporters run and how does one configure this? Zeebe.cfg.toml contains exporter config but I do not see export interval.
When does Zeebe determines it needs to call the exporters?
How do exporters affect broker’s performance?
etc.

Thanks.

Philipp_Ossler · October 4, 2019, 6:21am

Regarding your questions:

Snapshots are used to restore the state in RocksDb. But the broker also needs the log stream because the snapshot may not contain all data. In a cluster, the data is replicated from the leader to the followers. When the leader goes down then one of the followers takes over.

The exporters are called when new data is available. The concrete exporters may implement a batching that can be configured. For example, the ES exporter: zeebe/exporters/elasticsearch-exporter at 26509a832be208783d79ad9aca763cd26cc20306 · camunda/zeebe · GitHub

It is here → https://stage.docs.zeebe.io/operations/resource-planning.html

system · January 31, 2024, 10:10am