Pod stuck on ContainerCreating

jonas_1 · July 29, 2020, 8:36pm

Hi,

I’m having some issue with the gateway pod pending in an auto scale cluster with 14 vCPUs and 28.00 GB persistent storage showing a warning about “Does not have minimum availability”:

PS> kubectl get pods
NAME                                                           READY   STATUS              RESTARTS   AGE
elasticsearch-master-0                                         1/1     Running             0          6h51m
elasticsearch-master-1                                         1/1     Running             0          6h51m
elasticsearch-master-2                                         1/1     Running             0          6h51m
zeebe-cluster-nginx-ingress-controller-746cd8b794-pcb89        1/1     Running             0          6h51m
zeebe-cluster-nginx-ingress-default-backend-7c987dd75d-2fp7n   1/1     Running             0          6h51m
zeebe-cluster-operate-6cbff9f89d-b6n52                         1/1     Running             13         6h51m
zeebe-cluster-zeebe-0                                          1/1     Running             0          6h51m
zeebe-cluster-zeebe-1                                          1/1     Running             0          6h51m
zeebe-cluster-zeebe-2                                          1/1     Running             0          6h51m
zeebe-cluster-zeebe-gateway-d59f5845-46429                     0/1     ContainerCreating   0          6h51m

PS> kubectl describe pod zeebe-cluster-zeebe-gateway-d59f5845-46429
Name:           zeebe-cluster-zeebe-gateway-d59f5845-46429
Namespace:      default
Priority:       0
Node:           gke-cluster-1-default-pool-d2ecf2b1-ggnh/XX.X.X.XX
Start Time:     Wed, 29 Jul 2020 15:28:29 +0200
Labels:         app.kubernetes.io/component=gateway
                app.kubernetes.io/instance=zeebe-cluster
                app.kubernetes.io/managed-by=Helm
                app.kubernetes.io/name=zeebe
                app.kubernetes.io/version=0.23.4
                helm.sh/chart=zeebe-0.0.127
                pod-template-hash=d59f5845
Annotations:    kubernetes.io/limit-ranger: LimitRanger plugin set: cpu request for container zeebe
Status:         Pending
IP:
IPs:            <none>
Controlled By:  ReplicaSet/zeebe-cluster-zeebe-gateway-d59f5845
Containers:
  zeebe:
    Container ID:
    Image:          camunda/zeebe:0.23.4
    Image ID:
    Ports:          9600/TCP, 26500/TCP, 26502/TCP
    Host Ports:     0/TCP, 0/TCP, 0/TCP
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Requests:
      cpu:      100m
    Readiness:  tcp-socket :gateway delay=20s timeout=1s period=5s #success=1 #failure=3
    Environment:
      ZEEBE_STANDALONE_GATEWAY:            true
      ZEEBE_GATEWAY_CLUSTER_CLUSTERNAME:   zeebe-cluster-zeebe
      ZEEBE_GATEWAY_CLUSTER_MEMBERID:      zeebe-cluster-zeebe-gateway-d59f5845-46429 (v1:metadata.name)
      ZEEBE_LOG_LEVEL:                     info
      JAVA_TOOL_OPTIONS:                   -XX:MaxRAMPercentage=25.0 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/usr/local/zeebe/data -XX:ErrorFile=/usr/local/zeebe/data/zeebe_error%p.log -XX:+ExitOnOutOfMemoryError
      ZEEBE_GATEWAY_CLUSTER_CONTACTPOINT:  zeebe-cluster-zeebe:26502
      ZEEBE_GATEWAY_NETWORK_HOST:          0.0.0.0
      ZEEBE_GATEWAY_NETWORK_PORT:          26500
      ZEEBE_GATEWAY_CLUSTER_HOST:           (v1:status.podIP)
      ZEEBE_GATEWAY_CLUSTER_PORT:          26502
      ZEEBE_GATEWAY_MONITORING_HOST:       0.0.0.0
      ZEEBE_GATEWAY_MONITORING_PORT:       9600
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-tjhxk (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      zeebe-cluster-zeebe
    Optional:  false
  default-token-tjhxk:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-tjhxk
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason       Age                     From                                               Message
  ----     ------       ----                    ----                                               -------
  Warning  FailedMount  7m4s (x179 over 6h50m)  kubelet, gke-cluster-1-default-pool-d2ecf2b1-ggnh  Unable to mount volumes for pod "zeebe-cluster-zeebe-gateway-d59f5845-46429_default(a076a9c3-080d-47ec-92b5-fb746eab8968)": timeout expired waiting for volumes to attach or mount for pod "default"/"zeebe-cluster-zeebe-gateway-d59f5845-46429". list of unmounted volumes=[config]. list of unattached volumes=[config default-token-tjhxk]
  Warning  FailedMount  69s (x210 over 6h52m)   kubelet, gke-cluster-1-default-pool-d2ecf2b1-ggnh  MountVolume.SetUp failed for volume "config" : configmap "zeebe-cluster-zeebe" not found

What might be the problem here?

Zelldon · August 3, 2020, 9:31am

Hey @jonas

how does your configuration look like? I assume you use the helm charts? Which version? Please post your values file.

Greets
Chris

jonas_1 · August 3, 2020, 9:44am

Hej @Zelldon,

After installation of the GKE cluster I followed the instructions and executed this command:

helm install zeebe-cluster zeebe/zeebe-full

Not sure where to find what you’re asking for. Pod details in cluster shows:

helm.sh/chart: zeebe-0.0.127

Zelldon · August 3, 2020, 12:41pm

Hey @jonas

thanks for reporting that. With the given information I was able to reproduce it and open an issue in the full-chart. https://github.com/zeebe-io/zeebe-full-helm/issues/85

Normally you specify a values file to configure the to deployed chart. Looks like this: helm install zeebe-cluster zeebe/zeebe-cluster -f zeebe-values.yaml Here is an example values file which is used by the Zeebe Team for benchmarks https://github.com/zeebe-io/zeebe/blob/develop/benchmarks/setup/default/zeebe-values.yaml

As a workaround you can use the zeebe/zeebe-cluster helm chart. It seems to work without a values file.

Greets
Chris

salaboy · August 3, 2020, 2:42pm

@Zelldon @Jonas I’ve added the details in the issue reported by @Zelldon, but I couldn’t reproduce… I’ve deployed the charts in GKE as well… and it don’t see anything failing…
The error might be related with resources… but @Zelldon pod describe shows an issue with a ConfigMap… so if you guys have changed any parameters or did something besides installing the chart, please report to the issue. Also remember that you can run helm list to see exactly which version of the chart are you running… in the issue I am running version 0.0.82 (of the full chart) and it is working correctly.

jonas_1 · August 3, 2020, 3:24pm

Hi @Zelldon, @salaboy.

Seems like I might have made a mistake. My GKE cluster was named ‘cluster-1’ and the installation seems to work after giving that name as the ‘release name’ in the command:

helm install <RELEASE NAME> zeebe/zeebe-full -f zeebe-values.yaml

Could that be it - they have to be the same?

> .\zbctl.exe --insecure status
Cluster size: 3
Partitions count: 3
Replication factor: 3
Gateway version: 0.23.4
Brokers:
  Broker 0 - cluster-1-zeebe-0.cluster-1-zeebe.default.svc.cluster.local:26501
    Version: 0.23.4
    Partition 1 : Leader
    Partition 2 : Leader
    Partition 3 : Follower
  Broker 1 - cluster-1-zeebe-1.cluster-1-zeebe.default.svc.cluster.local:26501
    Version: 0.23.4
    Partition 1 : Follower
    Partition 2 : Follower
    Partition 3 : Follower
  Broker 2 - cluster-1-zeebe-2.cluster-1-zeebe.default.svc.cluster.local:26501
    Version: 0.23.4
    Partition 1 : Follower
    Partition 2 : Follower
    Partition 3 : Leader

> helm list
NAME            NAMESPACE       REVISION        UPDATED                                 STATUS      CHART               APP VERSION
cluster-1       default         1               2020-08-03 17:12:02.5920467 +0200 CEST  deployed    zeebe-full-0.0.85   1.0

salaboy · August 3, 2020, 3:44pm

Awesome!

system · January 31, 2024, 10:11am