Failed workflow instance with message start prevents others from start starting

I have a simple workflow which is triggered based on a message event as follows:

If an instance of that workflow fails, say at the exclusive gateway, new instances are not created even though there are messages available to trigger their creation.

I have to terminate the failed workflow instance or correct the situation for other new workflow instances to be created based on the message start events.

Is this expected behavior?

Hi @klaus.nji,

yes, this behavior is expected because the workflow is created by a message start event. So, no new instance is created by a message with the same correlation key until the current workflow instance is ended.
Please have a look at the docs: https://docs.zeebe.io/bpmn-workflows/message-events/message-events.html#message-start-events

Does this help you?

Best regards,
Philipp

1 Like

@philipp.ossler,

Thanks for the response. I question the logic though that a new instance cannot be created until an existing instance has finished. Is this a threading issue?

Let me give a scenario using the classical retail process:

  1. Main order-processing workflow runs and creates say 1000 orders.
  2. For each order being processed, we fire a message (via Kafka) for example indicating order completed.
  3. A new instance of say a machine learning workflow which subscribes to the order completed event as indicated above is created for each order event.
  4. If one of the orders had a bad parameter in its variable stack the machine-learning workflow will fail.

At this point, I would have expected: new order events coming in should create new machine-learning workflow instances even though there is a current inactive or failed instance.

So in this example, if 200 of the orders had bad variables, we should expect 800 machine-learning workflows completed with 200 active (or not completing).

Don’t know if this makes any sense. If not I’ll put together a diagram.

1 Like

The issue is correlation key. If you start it based on message name, then it is “anonymous”, and you can start as many as you want - but correlation key creates instance identity.

That was the problem. I was testing with the same correlation key. Thanks guys.