Performance profiling tool

I’ve updated my ghetto TPS tool to enable profiling Zeebe releases. You can check it out here: https://github.com/jwulf/zeebe-ghetto-tps.

It uses Docker to start various brokers and create workflows as fast as possible. You can run it with back pressure enabled or disabled, and specify the number of partitions.

These numbers are an example. You should not use these numbers, you should run the test yourself, for a long period of time.

Here is an example of it running on my machine with backpressure disabled, with 3 partitions:

➜ t test.ts -z 0.22.5,0.23.5,0.24.1,0.25.0-alpha2 -t 30 -d -p 3

Version: 0.22.5 | Time: 30s
Starting Zeebe 0.22.5 with 3 partitions | Backpressure disabled...
Started Zeebe broker
Workflow deployed: 2251799813685250
Time :   Total   | wf/s  | running average
5s :     145     | 29    | 29/sec
10s :    345     | 40    | 35/sec
15s :    609     | 53    | 41/sec
20s :    915     | 61    | 46/sec
25s :    1264    | 70    | 51/sec
Average TPS: 51/sec.

Version: 0.23.5 | Time: 30s
Starting Zeebe 0.23.5 with 3 partitions | Backpressure disabled...
Started Zeebe broker
Workflow deployed: 2251799813685250
Time :   Total   | wf/s  | running average
5s :     110     | 22    | 22/sec
10s :    227     | 23    | 23/sec
15s :    350     | 25    | 23/sec
20s :    475     | 25    | 24/sec
25s :    613     | 28    | 25/sec
Average TPS: 25/sec.

Version: 0.24.1 | Time: 30s
Starting Zeebe 0.24.1 with 3 partitions | Backpressure disabled...
Started Zeebe broker
Workflow deployed: 2251799813685250
Time :   Total   | wf/s  | running average
5s :     47      | 9     | 9/sec
10s :    136     | 18    | 14/sec
15s :    252     | 23    | 17/sec
Average TPS: 17/sec.

Version: 0.25.0-alpha2 | Time: 30s
Starting Zeebe 0.25.0-alpha2 with 3 partitions | Backpressure disabled...
Started Zeebe broker
Workflow deployed: 2251799813685250
Time :   Total   | wf/s  | running average
5s :     184     | 37    | 37/sec
10s :    384     | 40    | 38/sec
15s :    599     | 43    | 40/sec
20s :    821     | 44    | 41/sec
25s :    1045    | 45    | 42/sec
Average TPS: 42/sec.

Same test with 2 partitions:

Version: 0.22.5 | Time: 30s
Starting Zeebe 0.22.5 with 2 partitions | Backpressure disabled...
Started Zeebe broker
Time :   Total   | wf/s  | running average
5s :     288     | 58    | 58/sec
10s :    640     | 70    | 64/sec
15s :    1007    | 73    | 67/sec
20s :    1385    | 76    | 69/sec
25s :    1731    | 69    | 69/sec
Average TPS: 69/sec.

Version: 0.23.5 | Time: 30s
Starting Zeebe 0.23.5 with 2 partitions | Backpressure disabled...
Started Zeebe broker
Time :   Total   | wf/s  | running average
5s :     76      | 15    | 15/sec
10s :    146     | 14    | 15/sec
15s :    216     | 14    | 14/sec
20s :    289     | 15    | 14/sec
25s :    359     | 14    | 14/sec
Average TPS: 14/sec.

Version: 0.24.1 | Time: 30s
Starting Zeebe 0.24.1 with 2 partitions | Backpressure disabled...
Started Zeebe broker
Time :   Total   | wf/s  | running average
5s :     55      | 11    | 11/sec
10s :    144     | 18    | 14/sec
15s :    255     | 22    | 17/sec
20s :    381     | 25    | 19/sec
25s :    530     | 30    | 21/sec
Average TPS: 21/sec.

Version: 0.25.0-alpha2 | Time: 30s
Starting Zeebe 0.25.0-alpha2 with 2 partitions | Backpressure disabled...
Started Zeebe broker
Time :   Total   | wf/s  | running average
5s :     145     | 29    | 29/sec
10s :    311     | 33    | 31/sec
15s :    484     | 35    | 32/sec
20s :    658     | 35    | 33/sec
25s :    837     | 36    | 33/sec
Average TPS: 33/sec.

Here it is running the same test, with one partition:

➜ t test.ts -z 0.22.5,0.23.5,0.24.1,0.25.0-alpha2 -t 30 -d -p 1

Version: 0.22.5 | Time: 30s
Starting Zeebe 0.22.5 with 1 partitions | Backpressure disabled...
Started Zeebe broker
Time :   Total   | wf/s  | running average
5s :     168     | 34    | 34/sec
10s :    400     | 46    | 40/sec
15s :    666     | 53    | 44/sec
20s :    963     | 59    | 48/sec
25s :    1307    | 69    | 52/sec
Average TPS: 52/sec.

Version: 0.23.5 | Time: 30s
Starting Zeebe 0.23.5 with 1 partitions | Backpressure disabled...
Started Zeebe broker
Time :   Total   | wf/s  | running average
5s :     149     | 30    | 30/sec
10s :    309     | 32    | 31/sec
15s :    492     | 37    | 33/sec
20s :    663     | 34    | 33/sec
25s :    835     | 34    | 33/sec
Average TPS: 33/sec.

Version: 0.24.1 | Time: 30s
Starting Zeebe 0.24.1 with 1 partitions | Backpressure disabled...
Started Zeebe broker
Time :   Total   | wf/s  | running average
5s :     51      | 10    | 10/sec
10s :    108     | 11    | 11/sec
15s :    173     | 13    | 12/sec
20s :    252     | 16    | 13/sec
25s :    326     | 15    | 13/sec
Average TPS: 13/sec.

Version: 0.25.0-alpha2 | Time: 30s
Starting Zeebe 0.25.0-alpha2 with 1 partitions | Backpressure disabled...
Started Zeebe broker
Time :   Total   | wf/s  | running average
5s :     151     | 30    | 30/sec
10s :    367     | 43    | 37/sec
15s :    627     | 52    | 42/sec
20s :    931     | 61    | 47/sec
25s :    1268    | 67    | 51/sec
Average TPS: 51/sec.

Here are the aggregated results. I’ve put an asterisk next to the fastest partition configuration for each version.

Again, don’t refer to these results as the official performance. Run the test yourself:

Version.   Partitions       Average TPS
0.22.5.          1              52
                 2*             69
                 3              51
0.23.5           1*             33
                 2              14
                 3              25
0.24.1           1              13
                 2*             21
                 3              17
0.25.0-a2        1*             51
                 2              33
                 3              42

You really need to run them longer to see how each performs over time, with garbage collection etc…

But this is a good tool for examining the relative performance of releases, and the impact of partitioning.

1 Like

Great work, @jwulf. Very useful.

Some ideas for your tests:

  1. Count not only how many workflows you are able to start but how many workflows are completed per/sec.
  2. Collect how long it takes to complete a single “nothing” workflow.
  3. Collect the time between workflow instance is created and the time the “nothing” task is started to executing.
  4. Run several Zeebe brokers, not only partitions. E.g. 3 brokers, 3 partitions, 3 replication factor.

These metric may show surprising results of how surprisingly bad Zeebe works in relation to available hardware.

You are right about running these test for long periods, like months. What I learned from similar tests that even if everything is perfect, Zeebe goes into itself several times a day that produced significant delays in processing workflows. But the perfect state is only temporary. Normal state: one broker is lost or one partition is lost, significant delays in executing all the time.

Thanks @SergeyL!

Have a look at the TODO list in the README file. These are on the road map. Great minds think alike. :smile: