Thanks for the clarification, that helps me understand. Let’s focus on example 1 for now, where we have a status that is some kind of aggregated state of a workflow instance. Contrary to my previous post, I believe that managing such data in an external system is better than doing it in Zeebe.
That is because Zeebe will probably never have as powerful data manipulation and querying capabilities that Camunda BPM has, as Camunda BPM is based on a relational database (good for querying and manipulating any data) whereas Zeebe is built on the log stream concept (good at sequential processing).
The way you would build this with Zeebe is via a component that connects Zeebe and the secondary system that stores the status data (e.g. Elastic Search). This connecting component embeds the Zeebe client and opens a subscription to all of Zeebe’s workflow-related events. On every event that is of interest to you, this component would update the status in the secondary system. On workflow instance completion or cancellation, you would remove that state (if required), etc.
The Zeebe client provides at-least-once invocation semantics for the handling of every subscribed event, so you are guaranteed to not miss a single one.
Of course, the logic which events are of interest must be encoded in this connecting component. Still the BPMN XML could be used to encode the configuration per process (e.g. you could have an extension property on every element that is relevant to the status).
It is correct that you then have two systems that manage data that is related. I see two cases for inconsistencies:
- Zeebe is ahead of the secondary system. For example, a task has already been completed but the secondary system has not recognized this, because it was not notified via the subscription yet.
- There is data loss in the secondary system.
For case 1, I don’t think that this is a problem for our example. Eventually the state of the secondary system will be up-to-date. In reality, this should be a short amount of time that shouldn’t be noticeable for humans. Also, the secondary system can maintain a timestamp for how old the data on a certain workflow instance is to somehow make that transparent to the user.
For case 2, you will need to have backups of the secondary system that contain the status data as well as the position of the last event that was received and processed via the Zeebe subscription. Via the Zeebe client, it is possible to “rewind” subscriptions to any previous event. On recovery, you can then rewind the subscription to the position of the last backup and then reprocess all events to recover the last state. If all backups were lost, you can rewind to the beginning. In the future, we will have a feature that will clean up events in Zeebe after a certain time and if they are no longer required in order to recover disk space. Then the rewind approach is limited to that window.
For the second example you gave, having the data in a secondary system allows you to define how long you keep which data, so it should not be an issue to keep data for longer than the time of a workflow instance.
Does that make sense?