Any suggestion for setting high throughput and high availiablity Zeebe cluster?

As the article (https://zeebe.io/blog/2019/12/zeebe-performance-profiling/) shows, increasing partition count help increase Zeebe throughput. We did pressure test with a 1-gateway-4-broker cluster. Each broker was configured with 16 core CPU, 16GB memory and 500GB SSD. When we increase partion count from 24 to 48, the cluster got tps(instance created per second) increment from 2000 to 3000. It’s acceptable. But when configured with 48 partitions, the cluster fault tolerance became worse. We close a broker and start a new broker after 30mins, but the new broker can not rejoin the cluster after nearly 8 hours.

In our case, we want a cluster that can hanlde 30000 instance creation per second. So I wonder any suggestions for setting high throughput and high availiablity Zeebe cluster? And what’s the maximum scale having been used in production environment?

Added a comment here https://github.com/zeebe-io/zeebe/issues/4988#issuecomment-660306301

1 Like