Job timeout sometimes skips multiple timeout intervals when re-issuing a job

Update to below: I thought the issue I mention below may have been already fixed in the latest Zeebe version 0.23.4 (https://github.com/zeebe-io/zeebe/issues/4396) - but after upgrading to 0.23.4, I still see the same behavior outlined below.

Not sure if this is a broker or a client thing (I am using the latest C# client with Zeebe 0.23.1 on a local machine with one broker). See below log - I am setting job expiration of 10 seconds; polling for this job type is set to 250ms. What I am observing is that Zeebe re-issues the job sometimes right when it’s supposed to time out; other times it skips one or more timeout intervals (pay attention to Deadline timestamp - that is when the job is supposed to time out, vs the time stamp of the next job instance being issued – e.g. [16:55:19 INF] ).

First instance is correct: Deadline: 7/18/2020 4:55:18 PM, next instance reissued at [16:55:19 INF]
Second instance wrong: Deadline: 7/18/2020 4:55:29 PM, next instance reissued at [16:55:49 INF]

[16:55:08 INF] Zeebe.GetTopology()::OK GatewayVersion: 0.23.1
Brokers:

 Address: localhost:26501,
 Host: localhost,
 NodeId: 0,
 Port: 26501,
 Partitions:
  PartitionId: 1,
  IsLeader: True,
  Role: LEADER

[16:55:08 INF] ZeebeService initialized successfully.
[16:55:08 INF] Zeebe.RegisterJobHandlers(Worker:jobA)::OK
[16:55:08 INF]
Zeebe.HandleJob(Worker:XPS_jobA)::OK
  WorkflowInstanceKey: 2251799813721948
  **Deadline: 7/18/2020 4:55:18 PM**
  ElementInstanceKey: 2251799813721973
  Key: 2251799813721976

**[16:55:19 INF]**
Zeebe.HandleJob(Worker:XPS_jobA)::OK
  WorkflowInstanceKey: 2251799813721948
  **Deadline: 7/18/2020 4:55:29 PM**
  ElementInstanceKey: 2251799813721973
  Key: 2251799813721976

**[16:55:49 INF]**
Zeebe.HandleJob(Worker:XPS_jobA)::OK
  WorkflowInstanceKey: 2251799813721948
  **Deadline: 7/18/2020 4:55:58 PM**
  ElementInstanceKey: 2251799813721973
  Key: 2251799813721976

**[16:56:19 INF]**
Zeebe.HandleJob(Worker:XPS_jobA)::OK
  WorkflowInstanceKey: 2251799813721948
  Deadline: 7/18/2020 4:56:29 PM
  ElementInstanceKey: 2251799813721973
  Key: 2251799813721976

[16:56:49 INF]
Zeebe.HandleJob(Worker:XPS_jobA)::OK
  WorkflowInstanceKey: 2251799813721948
  Deadline: 7/18/2020 4:56:59 PM
  ElementInstanceKey: 2251799813721973
  Key: 2251799813721976

[16:57:19 INF]
Zeebe.HandleJob(Worker:XPS_jobA)::OK
  WorkflowInstanceKey: 2251799813721948
  Deadline: 7/18/2020 4:57:29 PM
  ElementInstanceKey: 2251799813721973
  Key: 2251799813721976

[16:57:49 INF]
Zeebe.HandleJob(Worker:XPS_jobA)::OK
  WorkflowInstanceKey: 2251799813721948
  Deadline: 7/18/2020 4:57:59 PM
  ElementInstanceKey: 2251799813721973
  Key: 2251799813721976

[16:58:19 INF]
Zeebe.HandleJob(Worker:XPS_jobA)::OK
  WorkflowInstanceKey: 2251799813721948
  Deadline: 7/18/2020 4:58:29 PM
  ElementInstanceKey: 2251799813721973
  Key: 2251799813721976

There is nothing else running other than that single workflow instance; machine has plenty of free disk space and other resources. The workflow is just start - service task (jobA type) - end. One instance of it is started, with a job being picked up by the client but never completing (auto-complete is off for this job type). Max jobs of this type is set to 500.

Hey @mostwired

thanks for reporting that.

I think it might be a race condition in the c# client. I can remember I fixed lately something there, but I haven’t released it yet. Could you please open a issue there thanks.

Other than that could you please share your setup, like standalone gw, how many partitions etc. Could you also share your workflow please? This helps to reproduce it

Greets
Chris

Hi Chris,

The workflow I used to test this is very straightforward:

image

The setup is a single-node local Zeebe install:

Address: localhost:26501,
 Host: localhost,
 NodeId: 0,
 Port: 26501,
 Partitions:
  PartitionId: 1,
  IsLeader: True,
  Role: LEADER

I have opened an issue here:

https://github.com/zeebe-io/zeebe-client-csharp/issues/148