Failed workflow instances stuck after workflow change

I have about 8 failed workflow instances which are stuck in a certain step. I was going through the shipping tutorial and probably run certain instances before completely defining the workflow or I may have been missing some variables of service type definitions.

One of the failed workflow instances has all the required variables but still stuck proceeding to the final step. Also, with the workflow definition complete new instances complete successfully. What do I need to complete these failed workflows?

I am using the Java client.

The flow is defined as follows:

final int version = deploymentEvent.getWorkflows().get(0).getVersion();
System.out.println("Workflow deployed. Version : " + version);


String orderItemsStr = "orderItems";

System.out.println("Creating workflow variables for deployed workflow");
final Map<String, Object> data = new HashMap<>();
data.put("orderId",31243);
data.put(orderItemsStr , JsonUtil.toJson(Arrays.asList(435, 182, 376)));

System.out.println("Zeebe: creating an instance of deployed workflow");
final WorkflowInstanceEvent  workflowInstanceEvent = client.newCreateInstanceCommand()
        .bpmnProcessId("order-process")
        .latestVersion()
        .variables(data)
        .send()
        .join();

final long workflowInstanceKey = workflowInstanceEvent.getWorkflowInstanceKey();
System.out.println("Workflow instance created.  Key :" + workflowInstanceKey);

System.out.println("Adding a job worker for the Payment Service");
final JobWorker paymentServiceTask = client.newWorker()
        .jobType("payment-service")
        .handler(new PaymentService())
        .fetchVariables("orderId")
        .open();

//paymentServiceTask.close();

System.out.println("Adding a job worker for the Fetching Items Service");
final JobWorker fetchingItemsTask = client.newWorker()
        .jobType("fetching-items-service")
        .handler(new FetchingService())
        .fetchVariables(Arrays.asList("orderId", orderItemsStr, "totalPrice"))
        .open();

//fetchingItemsTask.close();

System.out.println("Adding a job worker for the Shipping Service Items Service");
final JobWorker shippingServiceTask = client.newWorker()
        .jobType("shipping-service")
        .handler(new ShippingService())
        .fetchVariables(Arrays.asList("orderId", "totalPrice", "itemsRetrieved", orderItemsStr))
        .open();

Services are straight forward.

Hey @klaus.nji,

thanks for trying out Zeebe.

What I understand from your post was that you deployed an workflow which was incomplete and you deployed a fixed version. With the fixed version everything works fine. But the incomplete version is stuck in the some task. Is this correct?

Might that you changed the job type? Do you have no worker for this job type?
If you just want to cancel them you can send a cancel them directly with a command.
If you want to complete them you need to send a activate and complete command or create a worker for this job type.

Greets
Chris

Somewhat correct @Zelldon. I have some active (not completed) workitem instances using version 4 of the workflow definition which are stuck.

I can understand if the workflow definition was changed as these instances no longer match the task definition. So yes, it may be that the job type change while I was modifying the workflow. However, what I am not understanding is why failed instances are stuck even though the workflow def is not changing.

How do I go about sending an activation or sending a complete command to these stuck jobs?

@klaus.nji as @Zelldon mention, you need to make sure that you have workers still running for that version of the process. If you changed the Job types you will need to still have a worker for those Job types. If the Jobs are not complete, as soon as a worker connects those jobs will be picked up, as far as I understand.

Hope this helps

@salaboy, thanks for the response. I get the part about the job types. However, the job types are not changing. The internal job mechanism is changing but its model (variables) are not changing. Most of my errors appear to be related to deserialization. Surprised not too many folk are reporting this problem as it is very fundamental to the internal workings if Zeebe. Hoping I am missing something…

@klaus.nji I see that you have created another thread … can you please help us to reproduce the problem that you are experiencing?

Where do you see them “stuck” - that’s not a thing, btw: do you mean that have raised an incident in Operate?

If it’s in Operate, then click the retry button.

It sounds like you are trying to do a bunch of things at once. Please write down the minimum number of steps to reproduce your problem from a clean start.

This will help you straighten it out (might even clear it up); allow us insight into what you are doing/not doing and to reproduce it over here; and isolate any actual bug so that we can address it.

@jwulf, different threads but I am ending up with the same problem and I hope you guys can give me some insights here.

When I say stuck, this means that the workflow is stopping execution at a point where I do not expect it to. Yes, Operate shows incidents, I can certainly retry and in some cases, the workflow continues execution but what I am trying to figure out is why my workflows is failing to complete. I have posted a complete sample of my project in the other thread for some insights.

Would you guys be able to tell me what version of Jackson is used by Zeebe 0.20.0?

You can find this kind of information at maven central
https://mvnrepository.com/artifact/io.zeebe/zeebe-client-java/0.20.0

It seems your workers have some problems with the serialization and that is the reason why the incidents are created (exceptions are thrown in your worker code)

Greets
Chris

2 Likes