Client request activate job in Zeebe but get an empty response

zeebe-grpc/0.23.1.0
zeebe 0.23.1

I am using successfully the python client from pypi.

My brand new integration tests discover something weird: sometimes, job are not detected by the worker so the test failed with a timeout.

Verifying the Elasticsearch data exported, I see that the job has been created, then activated !

So, if my python request to activateJobs has been successful, why is the response empty ?

    while True:
        logging.debug("ActivateJobs of type %s request...", agent_job_type)

        activate_jobs_response = stub.ActivateJobs(
            gateway_pb2.ActivateJobsRequest(
                # the job type, as defined in the BPMN process
                type=agent_job_type,
                # the name of the agent activating the jobs, mostly used for logging purposes
                worker=agent_job_type,
                # a job returned after this call will not be activated by another call until the
                # timeout( in ms) has been reached
                timeout=43200000, # 12h
                # the maximum jobs to activate by this request
                maxJobsToActivate=5,
                # a list of variables to fetch as the job variables; if empty, all visible variables at the time
                # of activation for the scope of the job will be returned
                # fetchVariable=[]
                # The request will be completed when at least one job is activated or after the
                # requestTimeout (in ms). if the requestTimeout = 0, a default timeout is used. if the
                # requestTimeout < 0, long polling is disabled and the request is completed immediately,
                # even when no job is activated.
                requestTimeout=5
            )
        )

        for response in activate_jobs_response:
            logging.debug("%s job get ActivateJobs response...", agent_job_type)
            for job in response.jobs:

The script send the request, so zeebe activate the job, but the loop on the response object does not trigger…

Any idea ?

@gizmo84 is the author of that client. If you don’t get any traction in here, you could try opening an issue at https://gitlab.com/stephane.ludwig/zeebe_python_grpc/-/issues.

1 Like

Hi @vtexier,

thanks for using the zeebe-grpc client :slight_smile:

One question before I further investigate your code:

You are using zeebe 0.23.1. Could you please try to update to the last 0.23 version? I think it’s currently 0.23.7. You should also upgrade zeebe-grpc. I just published a 0.23.7.0 version.

2 Likes

Very good news : bug is on my side ! :flushed:

When I was upgrading the Zeebe server, and the grpc client in the worker, I saw that in my docker container, the grpc client version was still 0.25.1!

I fixed the bug in the dependencies handling and the integration tests were successful! :partying_face:

For sure, 0.25.1 grpc client is not compatible with server 0.23.1.

To everybody: be sure to use the same version for the client and the server!

1 Like

Sadly, I run into the problem again… :cry:

  • zeebe-grpc 0.23.7.0
  • grpcio 1.33.2
  • protobuf 3.13.0
  • Zeebe server 0.23.7
  • Zeebe docker image: zeebe:0.23.7@sha256:8aa7418bf02e22fc52f897e03d5d8ee04b32a57e1b4e0961684d5ace8a2de9dc

Let’s see the output of my tool to analyze the Elasticsearch exported data of Zeebe:

JOBS --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Datetime                      bpmnProcessId                           version   type                workflowInstanceKey elementId           intent              key                 
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
2020-11-19T15:14:56.314000    test_SdzgjFj3FqSJmbOb8lAQ72uyQCTjWPsI      1         madam_scanfolder    2251799813691084    scanfolder_ba       CREATED             2251799813691089    
2020-11-19T15:14:56.373000    test_SdzgjFj3FqSJmbOb8lAQ72uyQCTjWPsI      1         madam_scanfolder    2251799813691084    scanfolder_ba       ACTIVATED           2251799813691089    
2020-11-19T15:14:56.471000    test_SdzgjFj3FqSJmbOb8lAQ72uyQCTjWPsI      1         madam_scanfolder    2251799813691084    scanfolder_ba       COMPLETED           2251799813691089    
2020-11-19T15:15:08.140000    test_SdzgjFj3FqSJmbOb8lAQ72uyQCTjWPsI      2         madam_scanfolder    2251799813691098    scanfolder_ba       CREATED             2251799813691104    
2020-11-19T15:15:09.561000    test_SdzgjFj3FqSJmbOb8lAQ72uyQCTjWPsI      2         madam_scanfolder    2251799813691098    scanfolder_ba       ACTIVATED           2251799813691104    

Here, the integration tests stopped at the second test, after 20 seconds of timeout waiting for the completed state (task is less than 5/10s).

A second try stopped later, with another workflow bpmn file, another worker:

JOBS --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Datetime                      bpmnProcessId                           version   type                workflowInstanceKey elementId           intent              key                 
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
2020-11-19T15:41:56.533000    test_6UmGJbyCTeggMBBAIuPKD2Y8wMqwsnWT   1         madam_scanfolder    2251799813691272    scanfolder_ba       CREATED             2251799813691277    
2020-11-19T15:41:56.667000    test_6UmGJbyCTeggMBBAIuPKD2Y8wMqwsnWT   1         madam_scanfolder    2251799813691272    scanfolder_ba       ACTIVATED           2251799813691277    
2020-11-19T15:41:57.320000    test_6UmGJbyCTeggMBBAIuPKD2Y8wMqwsnWT   1         madam_scanfolder    2251799813691272    scanfolder_ba       COMPLETED           2251799813691277    
2020-11-19T15:42:10.716000    test_6UmGJbyCTeggMBBAIuPKD2Y8wMqwsnWT   2         madam_scanfolder    2251799813691285    scanfolder_ba       CREATED             2251799813691293    
2020-11-19T15:42:10.776000    test_6UmGJbyCTeggMBBAIuPKD2Y8wMqwsnWT   2         madam_scanfolder    2251799813691285    scanfolder_ba       ACTIVATED           2251799813691293    
2020-11-19T15:42:18.052000    test_6UmGJbyCTeggMBBAIuPKD2Y8wMqwsnWT   2         madam_scanfolder    2251799813691285    scanfolder_ba       COMPLETED           2251799813691293    
2020-11-19T15:42:29.997000    test_iDLl7S92QqQDgvKpaJdgsfyQGq9vznb6   1         madam_scanfolder    2251799813691304    scanfolder_ba       CREATED             2251799813691309    
2020-11-19T15:42:30.036000    test_iDLl7S92QqQDgvKpaJdgsfyQGq9vznb6   1         madam_scanfolder    2251799813691304    scanfolder_ba       ACTIVATED           2251799813691309    
2020-11-19T15:42:30.406000    test_iDLl7S92QqQDgvKpaJdgsfyQGq9vznb6   1         madam_scanfolder    2251799813691304    scanfolder_ba       COMPLETED           2251799813691309    
2020-11-19T15:42:54.085000    test_iDLl7S92QqQDgvKpaJdgsfyQGq9vznb6   2         madam_scanfolder    2251799813691318    scanfolder_ba       CREATED             2251799813691325    
2020-11-19T15:42:54.113000    test_iDLl7S92QqQDgvKpaJdgsfyQGq9vznb6   2         madam_scanfolder    2251799813691318    scanfolder_ba       ACTIVATED           2251799813691325    
2020-11-19T15:42:56.484000    test_iDLl7S92QqQDgvKpaJdgsfyQGq9vznb6   2         madam_scanfolder    2251799813691318    scanfolder_ba       COMPLETED           2251799813691325    
2020-11-19T15:43:05.357000    test_wJkELaJoZIlqGh2gS2fhAO0S2TMR4jqk   1         madam_ffmpeg        2251799813691334    ffmpeg_create_video CREATED             2251799813691342    
2020-11-19T15:43:05.853000    test_wJkELaJoZIlqGh2gS2fhAO0S2TMR4jqk   1         madam_ffmpeg        2251799813691334    ffmpeg_create_video ACTIVATED           2251799813691342

After the job is activated, the worker continue to loop waiting for activated jobs, but the ones with no response are lost in oblivion…

[EDIT]
Even if I run the python worker client on my laptop (so not inside a docker service in a swarm network), I get the problem…

I start a workflow instance for process_id test_FPzcry4ZP1YzPfOfE2qNvWchFnSmKKZy.

2020-11-20T00:02:04.357000    test_FPzcry4ZP1YzPfOfE2qNvWchFnSmKKZy   1         madam_scanfolder    2251799813695059    scanfolder_ba       CREATED             2251799813695065    
2020-11-20T00:02:06.824000    test_FPzcry4ZP1YzPfOfE2qNvWchFnSmKKZy   1         madam_scanfolder    2251799813695059    scanfolder_ba       ACTIVATED           2251799813695065 

No response is seen in the python grpc client, so the worker do nothing.

But a tcpdump show me that a response for this particular process_id is received:

00:01:59.781542 IP 192.168.1.105.26500 > K72Jr.45696: Flags [P.], seq 50:129, ack 461, win 501, options [nop,nop,TS val 1325758811 ecr 1486111281], length 79
        0x0000:  4500 0083 2f1c 4000 3e06 8932 c0a8 0169  E.../.@.>..2...i
        0x0010:  c0a8 016d 6784 b280 8108 2943 52e2 2512  ...mg.....)CR.%.
        0x0020:  8018 01f5 a213 0000 0101 080a 4f05 795b  ............O.y[
        0x0030:  5894 4231 0000 4601 0400 0000 0188 5f10  X.B1..F......._.
        0x0040:  6170 706c 6963 6174 696f 6e2f 6772 7063  application/grpc
        0x0050:  400d 6772 7063 2d65 6e63 6f64 696e 6708  @.grpc-encoding.
        0x0060:  6964 656e 7469 7479 4014 6772 7063 2d61  identity@.grpc-a
        0x0070:  6363 6570 742d 656e 636f 6469 6e67 0467  ccept-encoding.g
        0x0080:  7a69 70                                  zip
00:01:59.782102 IP 192.168.1.105.26500 > K72Jr.45696: Flags [P.], seq 129:226, ack 461, win 501, options [nop,nop,TS val 1325758812 ecr 1486111281], length 97
        0x0000:  4500 0095 2f1d 4000 3e06 891f c0a8 0169  E.../.@.>......i
        0x0010:  c0a8 016d 6784 b280 8108 2992 52e2 2512  ...mg.....).R.%.
        0x0020:  8018 01f5 d81d 0000 0101 080a 4f05 795c  ............O.y\
        0x0030:  5894 4231 0000 4000 0000 0000 0100 0000  X.B1..@.........
        0x0040:  003b 08d1 cc80 8080 8080 0412 2574 6573  .;..........%tes
        0x0050:  745f 4650 7a63 7279 345a 5031 597a 5066  t_FPzcry4ZP1YzPf
        0x0060:  4f66 4532 714e 7657 6368 466e 536d 4b4b  OfE2qNvWchFnSmKK
        0x0070:  5a79 1801 20d3 cc80 8080 8080 0400 000f  Zy..............
        0x0080:  0105 0000 0001 400b 6772 7063 2d73 7461  ......@.grpc-sta
        0x0090:  7475 7301 30                             tus.0

The problem seems to be located in the python client…
[EDIT]
NO! This response is empty, see my next post for a full response dump.
So the problem is on the Zeebe side…

How I would approach this: I would go into the python client code and insert logging.

It seems like the python client sends the ActivateJobs request, but does not invoke your handler.

So I would test that assumption by going into the code where the python client activates the job, and log it out.

In the Node client, you turn on various debugging levels to see this kind of thing, including super verbose logging for the underlying grpc library that it uses. I’m not sure how it works in Python, but you may be able to do the same, in addition to adding log statements.

That’s strange, I can’t seem to find the source code at https://gitlab.com/stephane.ludwig/zeebe_python_grpc/-/tree/master

Is setup.py and zeebe_grpc/__init__.py literally the entire source code @gizmo84?

Is there a way to set an env variable to get it to emit more verbose logging from the internals?

There is also another Python client: https://github.com/JonatanMartens/pyzeebe

To configure gRPC behaviour, we can use the environment variables:

https://github.com/grpc/grpc/blob/master/doc/environment_variables.md

The code is automatically generated from the proto files and has only class description and bindings.
The real logic is in the grpcio/protobuf modules.

But I was wrong, the tcpdump of a good response is:

13:51:28.977665 IP 192.168.1.105.26500 > K72Jr.56806: Flags [P.], seq 6077:6379, ack 25449, win 501, options [nop,nop,TS val 1375528913 ecr 1989665509], length 302
        0x0000:  4500 0162 1fce 4000 3e06 97a1 c0a8 0169  E..b..@.>......i
        0x0010:  c0a8 016d 6784 dde6 516d eaa1 95e9 ec5a  ...mg...Qm.....Z
        0x0020:  8018 01f5 1407 0000 0101 080a 51fc e7d1  ............Q...
        0x0030:  7697 e2e5 0001 1b00 0000 0001 5300 0000  v...........S...
        0x0040:  0116 0a93 0208 cae3 8080 8080 8004 1210  ................
        0x0050:  6d61 6461 6d5f 7363 616e 666f 6c64 6572  madam_scanfolder
        0x0060:  18c5 e380 8080 8080 0422 2574 6573 745f  ........."%test_
        0x0070:  6a4a 6139 4b55 7367 4d62 6b4d 4558 6370  jJa9KUsgMbkMEXcp
        0x0080:  5142 4f42 6378 755a 6352 454e 6a49 7432  QBOBcxuZcRENjIt2
        0x0090:  2801 30c3 e380 8080 8080 043a 0d73 6361  (.0........:.sca
        0x00a0:  6e66 6f6c 6465 725f 6261 40c9 e380 8080  nfolder_ba@.....
        0x00b0:  8080 044a 7e7b 2273 6f75 7263 6522 3a22  ...J~{"source":"
        0x00c0:  2f6d 6e74 2f64 6973 6b73 7461 7469 6f6e  /mnt/diskstation
        0x00d0:  322f 4d45 4449 4153 2f64 6174 612f 6d61  2/MEDIAS/data/ma
        0x00e0:  6461 6d2f 7465 7374 732f 7465 7374 5f73  dam/tests/test_s
        0x00f0:  6361 6e66 6f6c 6465 7222 2c22 7265 6375  canfolder","recu
        0x0100:  7273 6976 6522 3a22 4661 6c73 6522 2c22  rsive":"False","
        0x0110:  676c 6f62 5f31 223a 222a 5f30 312e 2a22  glob_1":"*_01.*"
        0x0120:  2c22 676c 6f62 5f32 223a 222a 5f30 332e  ,"glob_2":"*_03.
        0x0130:  2a22 7d52 166d 6164 616d 5f73 6361 6e66  *"}R.madam_scanf
        0x0140:  6f6c 6465 725f 302e 352e 3058 0360 b0ac  older_0.5.0X.`..
        0x0150:  a3c2 de2e 6a02 7b7d 0000 0101 0500 0001  ....j.{}........
        0x0160:  53be                                     S.

In the dump of a missed job, custom_headers and others infos are missing:

00:01:59.781542 IP 192.168.1.105.26500 > K72Jr.45696: Flags [P.], seq 50:129, ack 461, win 501, options [nop,nop,TS val 1325758811 ecr 1486111281], length 79
        0x0000:  4500 0083 2f1c 4000 3e06 8932 c0a8 0169  E.../.@.>..2...i
        0x0010:  c0a8 016d 6784 b280 8108 2943 52e2 2512  ...mg.....)CR.%.
        0x0020:  8018 01f5 a213 0000 0101 080a 4f05 795b  ............O.y[
        0x0030:  5894 4231 0000 4601 0400 0000 0188 5f10  X.B1..F......._.
        0x0040:  6170 706c 6963 6174 696f 6e2f 6772 7063  application/grpc
        0x0050:  400d 6772 7063 2d65 6e63 6f64 696e 6708  @.grpc-encoding.
        0x0060:  6964 656e 7469 7479 4014 6772 7063 2d61  identity@.grpc-a
        0x0070:  6363 6570 742d 656e 636f 6469 6e67 0467  ccept-encoding.g
        0x0080:  7a69 70                                  zip
00:01:59.782102 IP 192.168.1.105.26500 > K72Jr.45696: Flags [P.], seq 129:226, ack 461, win 501, options [nop,nop,TS val 1325758812 ecr 1486111281], length 97
        0x0000:  4500 0095 2f1d 4000 3e06 891f c0a8 0169  E.../.@.>......i
        0x0010:  c0a8 016d 6784 b280 8108 2992 52e2 2512  ...mg.....).R.%.
        0x0020:  8018 01f5 d81d 0000 0101 080a 4f05 795c  ............O.y\
        0x0030:  5894 4231 0000 4000 0000 0000 0100 0000  X.B1..@.........
        0x0040:  003b 08d1 cc80 8080 8080 0412 2574 6573  .;..........%tes
        0x0050:  745f 4650 7a63 7279 345a 5031 597a 5066  t_FPzcry4ZP1YzPf
        0x0060:  4f66 4532 714e 7657 6368 466e 536d 4b4b  OfE2qNvWchFnSmKK
        0x0070:  5a79 1801 20d3 cc80 8080 8080 0400 000f  Zy..............
        0x0080:  0105 0000 0001 400b 6772 7063 2d73 7461  ......@.grpc-sta
        0x0090:  7475 7301 30                             tus.0

So it means that it is zeebe api which:

  • Accept request and activate job
  • Close the long poll connection with an empty response (bug!)

It’s more serious than I thought.

How are you running Zeebe? Because I have never seen this with docker, server running in Java, remote server, or Camunda Cloud (proxied by nginx) with the Node client. So it is unlikely to be the Zeebe GRPC Gateway itself.

The Python client is also less likely to be the culprit (although it could be), because it would be more widely experienced.

So, the most likely place to start would seem to be the Zeebe setup.

You could use another client - Java/Node or the zbctl CLI to remove the Python client from the equation.

Same problem with zbctl…

I got sometimes error messages, don’t know if its related…

I1120 17:43:44.434601597    6163 fork_posix.cc:70]           Fork support is only compatible with the epoll1 and poll polling strategies
Error: rpc error: code = DeadlineExceeded desc = context deadline exceeded
[]
DEBUG:root:ActivateJobs of type scanfolder request...
I1120 17:43:49.557217967    6163 fork_posix.cc:70]           Fork support is only compatible with the epoll1 and poll polling strategies
Error: rpc error: code = DeadlineExceeded desc = context deadline exceeded
[]
DEBUG:root:ActivateJobs of type scanfolder request...
I1120 17:43:54.676472655    6163 fork_posix.cc:70]           Fork support is only compatible with the epoll1 and poll polling strategies
Error: rpc error: code = DeadlineExceeded desc = context deadline exceeded
[]
DEBUG:root:ActivateJobs of type scanfolder request...
I1120 17:43:59.794302857    6163 fork_posix.cc:70]           Fork support is only compatible with the epoll1 and poll polling strategies
2020/11/20 17:44:04 No jobs found to activate for type madam_scanfolder
[]
DEBUG:root:ActivateJobs of type scanfolder request...
I1120 17:44:04.456176572    6163 fork_posix.cc:70]           Fork support is only compatible with the epoll1 and poll polling strategies
2020/11/20 17:44:09 No jobs found to activate for type madam_scanfolder
[]
DEBUG:root:ActivateJobs of type scanfolder request...
I1120 17:44:09.137085701    6163 fork_posix.cc:70]           Fork support is only compatible with the epoll1 and poll polling strategies
2020/11/20 17:44:13 No jobs found to activate for type madam_scanfolder

My Zeebe instance is a in a docker container, on a docker swarm network, its a standalone server (1 broker) with this config:

-- # ----------------------------------------------------

# Zeebe Standalone Broker configuration file (with embedded gateway)

# This file is based on broker.standalone.yaml.template but stripped down to contain only a limited
# set of configuration options. These are a good starting point to get to know Zeebe.
# For advanced configuration options, have a look at the templates in this folder.

# !!! Note that this configuration is not suitable for running a standalone gateway. !!!
# If you want to run a standalone gateway node, please have a look at gateway.yaml.template

# ----------------------------------------------------
# Byte sizes
# For buffers and others must be specified as strings and follow the following
# format: "10U" where U (unit) must be replaced with KB = Kilobytes, MB = Megabytes or GB = Gigabytes.
# If unit is omitted then the default unit is simply bytes.
# Example:
# sendBufferSize = "16MB" (creates a buffer of 16 Megabytes)
#
# Time units
# Timeouts, intervals, and the likes, must be specified either in the standard ISO-8601 format used
# by java.time.Duration, or as strings with the following format: "VU", where:
#   - V is a numerical value (e.g. 1, 5, 10, etc.)
#   - U is the unit, one of: ms = Millis, s = Seconds, m = Minutes, or h = Hours
#
# Paths:
# Relative paths are resolved relative to the installation directory of the broker.
zeebe:
  broker:
    gateway:
      # Enable the embedded gateway to start on broker startup.
      # This setting can also be overridden using the environment variable ZEEBE_BROKER_GATEWAY_ENABLE.
      enable: true

      network:
        # Sets the port the embedded gateway binds to.
        # This setting can also be overridden using the environment variable ZEEBE_BROKER_GATEWAY_NETWORK_PORT.
        port: 26500
      security:
        # Enables TLS authentication between clients and the gateway
        # This setting can also be overridden using the environment variable ZEEBE_BROKER_GATEWAY_SECURITY_ENABLED.
        enabled: false

    network:
      # Controls the default host the broker should bind to. Can be overwritten on a
      # per binding basis for client, management and replication
      # This setting can also be overridden using the environment variable ZEEBE_BROKER_NETWORK_HOST.
      host: 0.0.0.0

    data:
      # Specify a list of directories in which data is stored.
      # This setting can also be overridden using the environment variable ZEEBE_BROKER_DATA_DIRECTORIES.
      directories: [ "/mnt/diskstation2/MEDIAS/data/madam/zeebe" ]
      # The size of data log segment files.
      # This setting can also be overridden using the environment variable ZEEBE_BROKER_DATA_LOGSEGMENTSIZE.
      logSegmentSize: 512MB
      # How often we take snapshots of streams (time unit)
      # This setting can also be overridden using the environment variable ZEEBE_BROKER_DATA_SNAPSHOTPERIOD.
      snapshotPeriod: 15m

    cluster:
      # Specifies the Zeebe cluster size.
      # This can also be overridden using the environment variable ZEEBE_BROKER_CLUSTER_CLUSTERSIZE.
      clusterSize: 1
      # Controls the replication factor, which defines the count of replicas per partition.
      # This can also be overridden using the environment variable ZEEBE_BROKER_CLUSTER_REPLICATIONFACTOR.
      replicationFactor: 1
      # Controls the number of partitions, which should exist in the cluster.
      # This can also be overridden using the environment variable ZEEBE_BROKER_CLUSTER_PARTITIONSCOUNT.
      partitionsCount: 1

    threads:
      # Controls the number of non-blocking CPU threads to be used.
      # WARNING: You should never specify a value that is larger than the number of physical cores
      # available. Good practice is to leave 1-2 cores for ioThreads and the operating
      # system (it has to run somewhere). For example, when running Zeebe on a machine
      # which has 4 cores, a good value would be 2.
      # This setting can also be overridden using the environment variable ZEEBE_BROKER_THREADS_CPUTHREADCOUNT
      cpuThreadCount: 2
      # Controls the number of io threads to be used.
      # This setting can also be overridden using the environment variable ZEEBE_BROKER_THREADS_IOTHREADCOUNT
      ioThreadCount: 2
    # Elasticsearch Exporter ----------
    # An example configuration for the elasticsearch exporter:
    #
    # These setting can also be overridden using the environment variables "ZEEBE_BROKER_EXPORTERS_ELASTICSEARCH_..."
    #
    exporters:
      elasticsearch:
        className: io.zeebe.exporter.ElasticsearchExporter

        args:
          url: http://elasticsearch:9200

          bulk:
            delay: 5
            size: 1000

          # authentication:
          #   username: elastic
          #   password: changeme

          index:
            prefix: zeebe-record
            createTemplate: true

            command: false
            event: true
            rejection: false

            deployment: true
            error: true
            incident: true
            job: true
            jobBatch: false
            message: false
            messageSubscription: false
            variable: true
            variableDocument: true
            workflowInstance: true
            workflowInstanceCreation: false
            workflowInstanceSubscription: false

            ignoreVariablesAbove: 32677

Data are on a NAS, nfs4 mounted on a docker volume.

That was (a part) of the problem! :flushed:

Data access was too slow.

I moved all the Zeebe data to a much faster local disk on the Zeebe server, and the problem occurs far less (but it is still here…).

I don’t know why Zeebe do not send back the job on the api after setting it to ACTIVATED state on his side.

Do you know if Zeebe activate the job after responding to the api request as it should be or before ?

@jwulf there is no “code” :). zeebe-grpc is only auto generated from the gateway.proto file :wink:

@vtexier did you try to upgrade to to the latest zeebe version? (currently 0.25.1)

I continue to investigate…

I have set env variables to add debug logs to Zeebe:

    services:
      # Zeebe
      zeebe:
        image: "camunda/zeebe:{{ zeebe_version }}"
        environment:
          ZEEBE_LOG_LEVEL: "{{ zeebe_log_level }}"
          GRPC_VERBOSITY: DEBUG
        ports:

But the debug level of Zeebe is very quiet and gRPC do not see the env var like the python client…

    2020-11-24T13:59:01.772646189Z madam_zeebe.1.qog7chpaj7fn@z68-gen3    | 2020-11-24 13:59:01.772 [Broker-0-StreamProcessor-1] [Broker-0-zb-actors-0] INFO  org.camunda.feel.FeelEngine - Engine created. [value-mapper: CompositeValueMapper(List(io.zeebe.el.impl.feel.MessagePackValueMapper@5373a45)), function-provider: io.zeebe.el.impl.feel.FeelFunctionProvider@4acc0e9, clock: io.zeebe.el.impl.ZeebeFeelEngineClock@68ce39c8, configuration: Configuration(false)]

    2020-11-24T14:59:03.663000    test_zGVIvh9rqPfe6ijxLvNdB7MiSKwJtrAt   1         madam_ffmpeg        2251799813736912    ffmpeg_create_video CREATED             2251799813736921    
    2020-11-24T14:59:03.761000    test_zGVIvh9rqPfe6ijxLvNdB7MiSKwJtrAt   1         madam_ffmpeg        2251799813736912    ffmpeg_create_video ACTIVATED           2251799813736921    
    2020-11-24T14:59:08.133000    test_zGVIvh9rqPfe6ijxLvNdB7MiSKwJtrAt   1         madam_ffmpeg        2251799813736912    ffmpeg_create_video COMPLETED 

    2020-11-24T13:59:09.031027434Z madam_zeebe.1.qog7chpaj7fn@z68-gen3    | 2020-11-24 13:59:09.030 [GatewayLongPollingJobHandler] [Broker-0-zb-actors-0] DEBUG io.zeebe.gateway - Remove blocking request BrokerActivateJobsRequest{requestDto={"type":"madam_ffmpeg","worker":"madam_ffmpeg_0.5.1","timeout":43200000,"maxJobsToActivate":5,"jobKeys":[],"jobs":[],"variables":[],"truncated":false}} for job type madam_ffmpeg after timeout of PT5S
    2020-11-24T13:59:12.566696922Z madam_zeebe.1.qog7chpaj7fn@z68-gen3    | 2020-11-24 13:59:12.566 [Broker-0-StreamProcessor-1] [Broker-0-zb-actors-0] INFO  org.camunda.feel.FeelEngine - Engine created. [value-mapper: CompositeValueMapper(List(io.zeebe.el.impl.feel.MessagePackValueMapper@42a0d124)), function-provider: io.zeebe.el.impl.feel.FeelFunctionProvider@654b3e28, clock: io.zeebe.el.impl.ZeebeFeelEngineClock@52dda3e, configuration: Configuration(false)]

    2020-11-24T14:59:14.177000    test_zGVIvh9rqPfe6ijxLvNdB7MiSKwJtrAt   2         madam_ffmpeg        2251799813736929    ffmpeg_create_video CREATED             2251799813736936    
    2020-11-24T14:59:14.276000    test_zGVIvh9rqPfe6ijxLvNdB7MiSKwJtrAt   2         madam_ffmpeg        2251799813736929    ffmpeg_create_video ACTIVATED           2251799813736936 

    2020-11-24T13:59:14.252837112Z madam_zeebe.1.qog7chpaj7fn@z68-gen3    | 2020-11-24 13:59:14.252 [GatewayLongPollingJobHandler] [Broker-0-zb-actors-0] DEBUG io.zeebe.gateway - Remove blocking request BrokerActivateJobsRequest{requestDto={"type":"madam_ffmpeg","worker":"madam_ffmpeg_0.5.1","timeout":43200000,"maxJobsToActivate":5,"jobKeys":[],"jobs":[],"variables":[],"truncated":false}} for job type madam_ffmpeg after timeout of PT5S
    2020-11-24T13:59:19.451226703Z madam_zeebe.1.qog7chpaj7fn@z68-gen3    | 2020-11-24 13:59:19.450 [GatewayLongPollingJobHandler] [Broker-0-zb-actors-0] DEBUG io.zeebe.gateway - Remove blocking request BrokerActivateJobsRequest{requestDto={"type":"madam_ffmpeg","worker":"madam_ffmpeg_0.5.1","timeout":43200000,"maxJobsToActivate":5,"jobKeys":[],"jobs":[],"variables":[],"truncated":false}} for job type madam_ffmpeg after timeout of PT5S

You can notice that the log from ES (with the job events) have not the same timezone as Zeebe logs.
Do you know how can I tell Zeebe to use my timezone ?

@gizmo84, sure I will upgrade to the last version and continue to report here.

Upgrade done.

Zeebe server 0.25.1

Python client:

Package           Version
----------------- ---------
certifi           2020.11.8
chardet           3.0.4
docker            4.4.0
elasticsearch     6.8.1
elasticsearch-dsl 6.4.0
grpcio            1.33.2
idna              2.10
pip               20.2.4
protobuf          3.13.0
python-dateutil   2.8.1
PyYAML            5.3.1
requests          2.25.0
setuptools        50.3.2
six               1.15.0
urllib3           1.26.2
websocket-client  0.57.0
wheel             0.35.1
zeebe-grpc        0.25.1.0

Same problem, job activated in Zeebe, but response after long polling is empty:

2020-11-24T16:42:05.303655967Z madam_zeebe.1.n2pe4vkulgz4@z68-gen3    | 2020-11-24 16:42:05.303 [GatewayLongPollingJobHandler] [Broker-0-zb-actors-1] DEBUG io.zeebe.gateway - Remove blocking request BrokerActivateJobsRequest{requestDto={"type":"madam_ffmpeg","worker":"madam_ffmpeg_0.5.2","timeout":43200000,"maxJobsToActivate":5,"jobKeys":[],"jobs":[],"variables":[],"truncated":false}} for job type madam_ffmpeg after timeout of PT5S
2020-11-24T16:42:06.850970365Z madam_zeebe.1.n2pe4vkulgz4@z68-gen3    | 2020-11-24 16:42:06.850 [Broker-0-StreamProcessor-1] [Broker-0-zb-actors-0] INFO  org.camunda.feel.FeelEngine - Engine created. [value-mapper: CompositeValueMapper(List(io.zeebe.el.impl.feel.MessagePackValueMapper@56b0f4e)), function-provider: io.zeebe.el.impl.feel.FeelFunctionProvider@2e76eae6, clock: io.zeebe.el.impl.ZeebeFeelEngineClock@7df99f8d, configuration: Configuration(false)]

JOBS --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Datetime                      bpmnProcessId                           version   type                workflowInstanceKey elementId           intent              key                 
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
2020-11-24T17:42:08.487000    test_Ecx8UHCGDe02JPhmVrNpo3r9ex0c01KJ   1         madam_ffmpeg        2251799813685823    ffmpeg_create_video CREATED             2251799813685830    
2020-11-24T17:42:08.585000    test_Ecx8UHCGDe02JPhmVrNpo3r9ex0c01KJ   1         madam_ffmpeg        2251799813685823    ffmpeg_create_video ACTIVATED           2251799813685830    
2020-11-24T17:42:13.041000    test_Ecx8UHCGDe02JPhmVrNpo3r9ex0c01KJ   1         madam_ffmpeg        2251799813685823    ffmpeg_create_video COMPLETED           2251799813685830    
-----------------------------------------------------------------------------------------------------------------------------------
2020-11-24T16:42:13.915580501Z madam_zeebe.1.n2pe4vkulgz4@z68-gen3    | 2020-11-24 16:42:13.915 [GatewayLongPollingJobHandler] [Broker-0-zb-actors-0] DEBUG io.zeebe.gateway - Remove blocking request BrokerActivateJobsRequest{requestDto={"type":"madam_ffmpeg","worker":"madam_ffmpeg_0.5.2","timeout":43200000,"maxJobsToActivate":5,"jobKeys":[],"jobs":[],"variables":[],"truncated":false}} for job type madam_ffmpeg after timeout of PT5S
2020-11-24T16:42:17.375865936Z madam_zeebe.1.n2pe4vkulgz4@z68-gen3    | 2020-11-24 16:42:17.375 [Broker-0-StreamProcessor-1] [Broker-0-zb-actors-0] INFO  org.camunda.feel.FeelEngine - Engine created. [value-mapper: CompositeValueMapper(List(io.zeebe.el.impl.feel.MessagePackValueMapper@429efad7)), function-provider: io.zeebe.el.impl.feel.FeelFunctionProvider@637d86ae, clock: io.zeebe.el.impl.ZeebeFeelEngineClock@5e59cec8, configuration: Configuration(false)]

JOBS --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Datetime                      bpmnProcessId                           version   type                workflowInstanceKey elementId           intent              key                 
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
2020-11-24T17:42:19.061000    test_Ecx8UHCGDe02JPhmVrNpo3r9ex0c01KJ   2         madam_ffmpeg        2251799813685839    ffmpeg_create_video CREATED             2251799813685846    
2020-11-24T17:42:19.152000    test_Ecx8UHCGDe02JPhmVrNpo3r9ex0c01KJ   2         madam_ffmpeg        2251799813685839    ffmpeg_create_video ACTIVATED           2251799813685846
-----------------------------------------------------------------------------------------------------------------------------------
2020-11-24T16:42:19.145573362Z madam_zeebe.1.n2pe4vkulgz4@z68-gen3    | 2020-11-24 16:42:19.145 [GatewayLongPollingJobHandler] [Broker-0-zb-actors-1] DEBUG io.zeebe.gateway - Remove blocking request BrokerActivateJobsRequest{requestDto={"type":"madam_ffmpeg","worker":"madam_ffmpeg_0.5.2","timeout":43200000,"maxJobsToActivate":5,"jobKeys":[],"jobs":[],"variables":[],"truncated":false}} for job type madam_ffmpeg after timeout of PT5S
2020-11-24T16:42:24.375609692Z madam_zeebe.1.n2pe4vkulgz4@z68-gen3    | 2020-11-24 16:42:24.375 [GatewayLongPollingJobHandler] [Broker-0-zb-actors-0] DEBUG io.zeebe.gateway - Remove blocking request BrokerActivateJobsRequest{requestDto={"type":"madam_ffmpeg","worker":"madam_ffmpeg_0.5.2","timeout":43200000,"maxJobsToActivate":5,"jobKeys":[],"jobs":[],"variables":[],"truncated":false}} for job type madam_ffmpeg after timeout of PT5S

@jwulf, as it is the core business of a message broker, to be sure to deliver the messages, I think the problem must be taken care.

But as it seems I am the only one affected right now, and the team must be very busy, I am glad to stay at your disposal for more advanced tests if you want.

It will be helpful to have far more debug logs from Zeebe, and if possible from the gRPC functions used.

1 Like

Since you are getting the same issue with zbctl, we can confidently say this is not a Python client-specific issue.

You can run Zeebe the log level set to verbose for greater log detail.

This sounds like it is worth opening an issue in GitHub, so that the engineering team can look at it as a potential bug.

1 Like
madam_zeebe.1.un64ijjvqjcu@z68-gen3    | 2020-11-26 14:14:39,586 main WARN Error while converting string [verbose] to type [class org.apache.logging.log4j.Level]. Using default value [null]. java.lang.IllegalArgumentException: Unknown level constant [VERBOSE].

Log4j does not know the verbose log level…

It would be trace then, but not sure whether you really want that :sweat_smile: Maybe debug is enough.

https://logging.apache.org/log4j/2.x/manual/customloglevels.html

Greets
Chris

I have used level ALL to have a maximum of information to give to the issue I will open in github…

Hi @vtexier,

I have a problem that look really similar to what you are experiencing :
-I use Zeebe on my development machine and didn’t encounter any specific issue.
-I’m starting to have a look to deployment environment where we use Kubernetes and I experiment something really similar to what you describe here : job that are activated on Zeebe side but no message received on client side. Looking at the forum, i thought it might be related to longpooling. Disactivating it, I still encounter the issue. Your “slow storage” idea is interesting : for my Kubernetes deployment, “PVC” (Permanent Storage) are used for data, and by default in my case, it creates a remote block storage which must be slower than local SSD/NVMe.
I’ll have a look tomorrow on how disabling this remote storage and have a local one instead and see if it solve my issue.

2 Likes