Cancel a WorkflowInstance in Operate

Hi,

I have some trouble to Cancel and Delete WorkflowInstances in Operate.

For example I have an “Echo” WorkflowInstance with ID 225. Trying to cancel it, Operate get a Timeout and show “Canceleling Instance 225 failed”.

In the Zeebe Debug Log:

12:36:53.890 [io.zeebe.gateway.impl.broker.BrokerRequestManager] [gateway-zb-actors-0] ERROR io.zeebe.gateway - Error handling gRPC request
io.grpc.StatusRuntimeException: NOT_FOUND: Command rejected with code ‘CANCEL’: Expected to cancel a workflow instance with key ‘225’, but no such workflow was found
at io.grpc.Status.asRuntimeException(Status.java:523) ~[grpc-core-1.19.0.jar:1.19.0]
at io.zeebe.gateway.EndpointManager.convertThrowable(EndpointManager.java:257) ~[zeebe-gateway-0.17.0.jar:0.17.0]
at io.zeebe.gateway.EndpointManager.lambda$sendRequest$2(EndpointManager.java:235) ~[zeebe-gateway-0.17.0.jar:0.17.0]
at io.zeebe.gateway.impl.broker.BrokerRequestManager.lambda$sendRequest$1(BrokerRequestManager.java:90) ~[zeebe-gateway-0.17.0.jar:0.17.0]
at io.zeebe.gateway.impl.broker.BrokerRequestManager.lambda$sendRequest$3(BrokerRequestManager.java:109) ~[zeebe-gateway-0.17.0.jar:0.17.0]
at io.zeebe.gateway.impl.broker.BrokerRequestManager.lambda$sendRequestInternal$6(BrokerRequestManager.java:191) ~[zeebe-gateway-0.17.0.jar:0.17.0]
at io.zeebe.util.sched.future.FutureContinuationRunnable.run(FutureContinuationRunnable.java:35) [zeebe-util-0.17.0.jar:0.17.0]
at io.zeebe.util.sched.ActorJob.invoke(ActorJob.java:90) [zeebe-util-0.17.0.jar:0.17.0]
at io.zeebe.util.sched.ActorJob.execute(ActorJob.java:53) [zeebe-util-0.17.0.jar:0.17.0]
at io.zeebe.util.sched.ActorTask.execute(ActorTask.java:189) [zeebe-util-0.17.0.jar:0.17.0]
at io.zeebe.util.sched.ActorThread.executeCurrentTask(ActorThread.java:154) [zeebe-util-0.17.0.jar:0.17.0]
at io.zeebe.util.sched.ActorThread.doWork(ActorThread.java:135) [zeebe-util-0.17.0.jar:0.17.0]
at io.zeebe.util.sched.ActorThread.run(ActorThread.java:112) [zeebe-util-0.17.0.jar:0.17.0]
Caused by: io.zeebe.gateway.cmd.BrokerRejectionException: Command (CANCEL) rejected (NOT_FOUND): Expected to cancel a workflow instance with key ‘225’, but no such workflow was found

I think the Elasticsearch and Zeebe datas are not in sync but I think we should be able to “force delete” a WorkflowInstance from ElasticSearch.

Hi @gizmo84, thanks for the report, and we’ll look into this. Just to clarify–you can see workflow instance ID 225 in Operate, but it seems that Zeebe is in some way out of sync?

A possibility that I’ll throw out there: could it be that you submitted the cancellation request multiple times, but there was a lag in the UI, and this NOT FOUND error was the result of one of the additional cancellation requests? Or did the workflow instance not ever cancel? Let me know if that makes sense.

Best,
Mike

Hi @wints,

Just to clarify–you can see workflow instance ID 225 in Operate, but it seems that Zeebe is in some way out of sync?

Yeah, that exactly what happened but I don’t exactly know why. See the Screenshot:

I’m unable to delete this from operate.

Hi @gizmo84, thanks for the screenshot. That helps. To recap:

  • Operate is still showing the instance as “running” (a green circle to the left of the workflow name as in your screenshot)
  • But you can’t cancel the instance in Operate even though Operate says it’s running
  • And when you try to cancel the instance in Operate, you see the error in the Zeebe logs that you included in your first post

Did I get all of that right? If so, then I’ll take this to the Zeebe and Operate teams because it sounds like something unexpected is happening.

And to give some quick background on expected behavior:

  • Workflow instances currently cannot be deleted, only canceled, and after cancellation, they’ll be visible in Operate if “Canceled” is selected in the Filters menu (screenshot)
  • A canceled workflow instance will eventually be cleaned up from Zeebe state but will still be available in Operate

Based on this post, are these missing steps needed to reproduce this error:

  • Run this in docker, using the zeebe-io/zeebe-docker-compose profile for operate
  • Stop the containers with Ctrl-C.
  • Recreate them.
  • Now, attempt to stop a running workflow in Operate.

Something like this?

I will run some new tests on a clean basis and report if I still see this kind of errors…

1 Like

I got the same trouble to cancel a workflow.
operater 1.0.0
zeebe-broker-0.20.0

Hey @i.m.superman

could you please elaborate on that?

Greets
Chris

I created a instance of my test workflow, and ran a few workers of tasks in workflow.
The workflow was not finish yet and I click the cancel button in operate page.
The borker log said “io.grpc.StatusRuntimeException: NOT_FOUND: Command rejected with code ‘CANCEL’: Expected to cancel a workflow instance with key ‘4503599627374654’, but no such workflow was found”.
The instance was hanging there, it could be deleted and removed from the page.

Hi @i.m.superman, can you please post a minimal reproducer: that’s the minimum number of exact steps to reproduce the problem (including a Git Repo with the worker code). To be able to debug this for you, we’d need to see it happening, and see exactly what you are doing that has it happen.

thanks,
Josh

step1: I downloaded the zip package ( [camunda-operate-1.0.0.zip] & [zeebe-distribution-0.20.0.zip] ) and deployed them on my servers.

step2: Deployed a bpmn file to the zeebe cluster by Java code as below

public class TestZB {
public static void main(String[] args) throws Exception {
final String broker = “192.168.70.167:26500”;
final ZeebeClientBuilder clientBuilder =
ZeebeClient.newClientBuilder().brokerContactPoint(broker);
try (ZeebeClient client = clientBuilder.build()) {
final DeploymentEvent deploymentEvent =
client.newDeployCommand().addResourceFromClasspath(“dailyUpdate.bpmn”).send().join();
System.out.println("Deployment created with key: " + deploymentEvent.getKey());
}
}
}

step3: Created a instance of the workflow by Java code as below:

public class TestZB2 {
public static void main(String[] args) {
final String broker = “192.168.70.169:26500”;

    final String bpmnProcessId = "dailyUpdate";

    final ZeebeClientBuilder builder = ZeebeClient.newClientBuilder().brokerContactPoint(broker);

    try (ZeebeClient client = builder.build()) {

        System.out.println("Creating workflow instance");

        final WorkflowInstanceEvent workflowInstanceEvent =
                client
                        .newCreateInstanceCommand()
                        .bpmnProcessId(bpmnProcessId)
                        .latestVersion()
                        .send()
                        .join();

        System.out.println(
                "Workflow instance created with key: " + workflowInstanceEvent.getWorkflowInstanceKey());
    }
}

}

step4: executed some tasks and click the cancel button in the operate page. The error happened.

1 Like

Thanks for the breakdown @i.m.superman.

A couple of questions:

  1. How do I reproduce step 4 “Execute some tasks”? Do you mean “start some workflow instances”, or “service some tasks with (a) worker(s)”?
  2. Are you able to share a screenshot of your workflow?

Josh

1、Run the code as the pic:

2、workflow

And the ExampleJobHandler code?

1 Like

OK, do you see the job being output to the console in your worker when you start it?

One thing to note is that the representation in Operate lags behind the state of the system, because it comes from the exporter. So it does not update in real-time.

Got it.
The output information is shown in the console.
Up to now, The exception is not reproduced. I will keep watching.
Thanks.