How do I tune Zeebe 0.25.1 to reduce memory usage?

Lars Madsen: Hi guys - want to follow up on the posted in the forum a while back https://forum.zeebe.io/t/memory-profile-for-zeebe-brokers/1648/3
We are still seeing the same behaviour, with the 25.1 version. So, checking releases I see this: https://github.com/zeebe-io/zeebe/issues/5882 - is it possible that this has caused our extensive memory usage?

Another thing I would want to check, is what impact does the MaxMessageSize param have on memory usage - we’re running the default (4mb?) Would you expect lower memory footprint if this was reduced to say 64k?

Lars Madsen: I see we are running with

""executionMetricsExporterEnabled" : false

Nicolas: The maxMessageSize has an impact on memory usage, but it shouldn’t be too big - it mostly has an impact on performance due to cache/page faults.

That said, our current hypothesis is that most of the memory ends up being used by RocksDB, which is probably misconfigured at the moment (we hope to remedy that soon). With that in mind, lowering the message size might impact it since it would impact the size of the data stored in RocksDB, but I expect it wouldn’t do much.

Your best bet is to use zeebe.broker.data.rocksdb.columnFamilyOptions to specify the following options:

ZEEBE_BROKER_DATA_ROCKSDB_COLUMNFAMILYOPTIONS_WRITE_BUFFER_SIZE="8MB"
ZEEBE_BROKER_DATA_ROCKSDB_COLUMNFAMILYOPTIONS_MAX_WRITE_BUFFER_SIZE_TO_MAINTAIN="16MB"
ZEEBE_BROKER_DATA_ROCKSDB_COLUMNFAMILYOPTIONS_MAX_OPEN_FILES="1024"

This should help limit the memory usage of RocksDB while we figure out the correct configuration/usage (probably using less column families). Keep in mind that lowering the write buffer sizes has a performance impact, so you should tune these to get the performance you want and the memory usage under control. Also note that these settings help provide a soft upper bound, not a hard cap. It’s unclear whether a hard memory cap is at all possible with RocksDB.

Finally, if you’re on 0.26 (RC1 for now, but coming out January 12th), or if you turned on memory mapped storage (ZEEBE_BROKER_DATA_USEMMAP="true"), keep in mind that the RSS will be over inflated due to shared mappings, so you should be looking more specifically at the PSS (if running on bare metal) or the WSS (if running on Kubernetes, as this is what the oom-killer monitors). Measuring the page cache is also a good idea, as once the memory starts to be limited, the first thing the OS will do is drop the page cache, which will drastically slow down your application.

Lars Madsen: That’s great @Nicolas ! Thanks for your quick response!

Lars Madsen: Just a follow up question - we would ideally like to not have too much custom config compared to the «official» defaults. It sounds to me like you are planning to make these changes to the defaults as well? If so, do you know roughly when you expect to release?

Nicolas: Probably not before may/June next year. Our immediate focus for Q1 is helping get Camunda Cloud out of beta, and it runs on preemptible nodes, meaning this issue isn’t affecting that environment since the nodes are restarted at least once a day, probably more.

Nicolas: Of course, this is an open source project, meaning others may contribute this in the meantime, so it could be earlier as well.

Lars Madsen: Great - thanks for your response. We’re looking forward to Camunda Cloud :slightly_smiling_face:

Note: This post was generated by Slack Archivist from a conversation in the Zeebe Slack, a source of valuable discussions on Zeebe (get an invite). Someone in the Slack thought this was worth sharing!

If this post answered a question for you, hit the Like button - we use that to assess which posts to put into docs.

Lars Madsen: Additional follow-up question - changing the columnfamily settings, can that be done in-flight with a running zeebe cluster? Or will it require a clean install?

Nicolas: It can be done with an existing Zeebe install, but requires a restart.

Lars Madsen: :+1:

Lars Madsen: So, when setting the settings in our dev environment, it takes a little longer than normal to start up, but the broker do come online after fire-five minutes - however, the following error shows up in the logs:

Caused by: java.lang.IllegalStateException: Expected to create column family options for RocksDB, but one or many values are undefined in the context of RocksDB [Compiled ColumnFamilyOptions: {compaction_pri=kOldestSmallestSeqFirst, max_open_files="1024", max_write_buffer_size_to_maintain="16MB", write_buffer_size="8MB"}; User-provided ColumnFamilyOptions: {max_open_files="1024", max_write_buffer_size_to_maintain="16MB", write_buffer_size="8MB"}]. See RocksDB's cf_options.h and <http://options_helper.cc|options_helper.cc> for available keys and values.
	at io.zeebe.db.impl.rocksdb.ZeebeRocksDbFactory.createColumnFamilyOptions(ZeebeRocksDbFactory.java:129) ~[zeebe-db-0.25.1.jar:0.25.1]
	at io.zeebe.db.impl.rocksdb.ZeebeRocksDbFactory.open(ZeebeRocksDbFactory.java:73) ~[zeebe-db-0.25.1.jar:0.25.1]
	at io.zeebe.db.impl.rocksdb.ZeebeRocksDbFactory.createDb(ZeebeRocksDbFactory.java:58) ~[zeebe-db-0.25.1.jar:0.25.1]
	at io.zeebe.db.impl.rocksdb.ZeebeRocksDbFactory.createDb(ZeebeRocksDbFactory.java:25) ~[zeebe-db-0.25.1.jar:0.25.1]
	at io.zeebe.broker.system.partitions.impl.StateControllerImpl.openDb(StateControllerImpl.java:142) ~[zeebe-broker-0.25.1.jar:0.25.1]
	at io.zeebe.broker.system.partitions.impl.StateControllerImpl.recover(StateControllerImpl.java:125) ~[zeebe-broker-0.25.1.jar:0.25.1]
	at io.zeebe.broker.system.partitions.impl.steps.ZeebeDbPartitionStep.open(ZeebeDbPartitionStep.java:28) ~[zeebe-broker-0.25.1.jar:0.25.1]
	..

Can this be ignored? Will monitor system for failures…

Lars Madsen: Also…

020-12-19 13:15:31.760 [Broker-2-ZeebePartition-3] [Broker-2-zb-actors-0] ERROR io.zeebe.broker.system - Failed to install leader partition 3
java.lang.IllegalStateException: Unexpected error occurred while recovering snapshot controller during leader partition install for partition 3
	at io.zeebe.broker.system.partitions.impl.steps.ZeebeDbPartitionStep.open(ZeebeDbPartitionStep.java:35) ~[zeebe-broker-0.25.1.jar:0.25.1]
	at io.zeebe.broker.system.partitions.impl.PartitionTransitionImpl.installPartition(PartitionTransitionImpl.java:97) ~[zeebe-broker-0.25.1.jar:0.25.1]
	at io.zeebe.broker.system.partitions.impl.PartitionTransitionImpl.lambda$installPartition$2(PartitionTransitionImpl.java:105) ~[zeebe-broker-0.25.1.jar:0.25.1]
	at io.zeebe.util.sched.future.FutureContinuationRunnable.run(FutureContinuationRunnable.java:28) [zeebe-util-0.25.1.jar:0.25.1]
	at io.zeebe.util.sched.ActorJob.invoke(ActorJob.java:76) [zeebe-util-0.25.1.jar:0.25.1]
	at io.zeebe.util.sched.ActorJob.execute(ActorJob.java:39) [zeebe-util-0.25.1.jar:0.25.1]
	at io.zeebe.util.sched.ActorTask.execute(ActorTask.java:122) [zeebe-util-0.25.1.jar:0.25.1]
	at io.zeebe.util.sched.ActorThread.executeCurrentTask(ActorThread.java:107) [zeebe-util-0.25.1.jar:0.25.1]
	at io.zeebe.util.sched.ActorThread.doWork(ActorThread.java:91) [zeebe-util-0.25.1.jar:0.25.1]
	at io.zeebe.util.sched.ActorThread.run(ActorThread.java:204) [zeebe-util-0.25.1.jar:0.25.1]
Caused by: java.lang.IllegalStateException: Failed to recover from snapshots
	at io.zeebe.broker.system.partitions.impl.StateControllerImpl.recover(StateControllerImpl.java:134) ~[zeebe-broker-0.25.1.jar:0.25.1]
	at io.zeebe.broker.system.partitions.impl.steps.ZeebeDbPartitionStep.open(ZeebeDbPartitionStep.java:28) ~[zeebe-broker-0.25.1.jar:0.25.1]
	... 9 more

Lars Madsen: filling up logs…

Nicolas: Yes, it can’t be ignored (though it’s retried - probably shouldn’t be). Essentially, one of the configuration keys is not supported by RocksDB.

Nicolas: It seems max_write_buffer_size_to_maintain and write_buffer_size are supposed to be in bytes, so just plain integers, no strings. I understand the confusion here - we use Spring Boot for configuration, but the options you pass to RocksDB are passed as-is, without any transformation.

Lars Madsen: yes, I added them as ENV variables

Nicolas: Ah, sorry, what I mean is you cannot use 16MBor 8MB , but rather put the value as bytes - 8000000 for 8mb, etc.

Lars Madsen: yep

Nicolas: RocksDB expects a value which can be converted to int64, and is just bytes, not a string representation.

Lars Madsen: Thanks for quick response - giving it a try now :slightly_smiling_face:

Nicolas: Let me know if that helps :+1:

Lars Madsen: well

Lars Madsen: I have run into a different problem.

…datedReplicas\":3,\"currentRevision\":\"zeebe-cluster-zeebe-7db67cfccc\",\"updateRevision\":\"zeebe-cluster-zeebe-7db67cfccc\",\"collisionCount\":0}}":
    v1.StatefulSet.Spec: v1.StatefulSetSpec.Template: v1.PodTemplateSpec.Spec: v1.PodSpec.Containers:
    []v1.Container: v1.Container.Env: []v1.EnvVar: v1.EnvVar.Value: ReadString: expects
    " or n, but found 8, error found in #10 byte of ...|,"value":8000000},{"|...,
    bigger context ...|B_COLUMNFAMILYOPTIONS_WRITE_BUFFER_SIZE","value":8000000},{"name":"ZEEBE_BROKER_DATA_ROCKSDB_COLUMNF|...'

looks like leaving the env vars as integers doesn´t play nicely with the helm release

Lars Madsen: I have defined them as integers, if I set them as strings they end passed in as is, and failing inside rocs like before:

    env:
    - name: ZEEBE_BROKER_NETWORK_MAXMESSAGESIZE
      value: "128KB"
    - name: ZEEBE_BROKER_DATA_ROCKSDB_COLUMNFAMILYOPTIONS_WRITE_BUFFER_SIZE
      value: 8000000
    - name: ZEEBE_BROKER_DATA_ROCKSDB_COLUMNFAMILYOPTIONS_MAX_WRITE_BUFFER_SIZE_TO_MAINTAIN
      value: 16000000
    - name: ZEEBE_BROKER_DATA_ROCKSDB_COLUMNFAMILYOPTIONS_MAX_OPEN_FILES
      value: 1024

Nicolas: ah, sorry, the env vars must be strings, but your strings should be parsable as ints (so not “8mb”, just “8000000”)

Lars Madsen: so having them as this would probably work, but cant be released…