Messages missed when bring down the worker

I am currently load testing zeebe in order to implement in our codebase. I was running some test cases using the quick start guide and I find that some of the messages are missed by one of the workers.
Scenario 1:

  1. Load the broker by running broker.bat
    2)Pushed in messages into the instance using bat file
    zbctl create instance order-process --variables “{“orderId”: %VAR%}”
    3)Start workers (payment-service, inventory-service, shipment-service)
    zbctl create worker payment-service --handler cat &
    zbctl create worker inventory-service --handler cat &
    zbctl create worker shipment-service --handler cat &

Now I bring down the broker instance. Message processing stopped. Workers started to throw connection error (Assumed this to happen)
Bring up the broker again. Workers are up and the message processing is started from where it left.

This behavior was expected and I do not see any loss of messages.

Scenario 2:

  1. Load the broker by running broker.bat
    2)Pushed in messages into the instance using bat file
    zbctl create instance order-process --variables “{“orderId”: %VAR%}”
    3)Start workers (payment-service, inventory-service, shipment-service)
    zbctl create worker payment-service --handler cat &
    zbctl create worker inventory-service --handler cat &
    zbctl create worker shipment-service --handler cat &

Now I bring down the worker instances randomly. Message processing stopped. As soon as I start running the workers again, it seems that some of the messages are lost in processing.
I brought down shipment-service and then restarted the worker. Around 18 messages were missed but I did not see the logs that they have been processed later.

I brought down inventory-service and then restarted it. 3 messages were lost but I did see that they were processed in the end.

After some time I created a new instance for shipment-service, and all the missing messages were processed by that.

Why did not the first instance process the messages ?

Hi @ameetkun,

Thank you for raising this up. Can you please double check if the jobs are never processed by a worker again?

When a worker is started then it activate jobs and try to process them. By default, it activates the jobs for 5 minutes. In this time, the jobs can’t be activated by again (to ensure that they are processed exclusively). After 5 minutes, the jobs can be activated again. See also: https://docs.zeebe.io/basics/job-workers.html

Best regards,
Philipp

If you see screenshot_3 and screenshot_4 you will see that the new instance ran close to an hour later than the original one and it processed the messages.

Can you please share the broker logs? Maybe, it has logged a failure.

Did you try to use a Zeebe client (e.g. Java, Go, …) or only zbctl?
Can we reproduce the failure regularly?

Best regards,
Philipp

@ameetkun - can you put a minimal reproducer in GitHub? I’ll run it and see if I get the same result.