Docker Desktop Kubernetes

Hi
I’m trying to get Zeebe to run on Docker Desktop for Windows Kubernetes and seem to be having trouble getting the Zeebe pods and ElasticSearch pods to start running successfully.

I have a yaml file to override some of the parameters as I was running into “Insufficient CPU” and “Insufficient Memory” errors, but they seem to be resolved now. Below are the commands I have run with the respective output.

The linux VM used by Docker has been allocated 6GB RAM and 4 Cores.

I have also reduced the number of replicas to 1, just to keep things simple.

Any idea what the issue could be?

Thanks
Andy

Steps:
[Run]
helm install zeebe zeebe/zeebe-cluster -f local-override-values.yaml

[local-overrider-values.yaml]

resources:
requests:
cpu: “100m”
memory: “512m”
limits:
cpu: “1000m”
memory: “1024m”
pvcSize: “100m”
JavaOpts: |
-XX:+UseParallelGC
-XX:MinHeapFreeRatio=5
-XX:MaxHeapFreeRatio=10
-XX:MaxRAMPercentage=25.0
-XX:GCTimeRatio=4
-XX:AdaptiveSizePolicyWeight=90
-XX:+PrintFlagsFinal
-Xmx256m
-Xms265m
-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=/usr/local/zeebe/data
-XX:ErrorFile=/usr/local/zeebe/data/zeebe_error%p.log
clusterSize: “1”
partitionCount: “1”
replicationFactor: “1”
cpuThreadCount: “1”
ioThreadCount: “1”
elasticsearch:
imageTag: 6.8.3
resources:
requests:
cpu: “100m”
memory: “512m”
limits:
cpu: “1000m”
memory: “1024m”
volumeClaimTemplate:
accessModes: [ “ReadWriteOnce” ]
storageClassName: “standard”
resources:
requests:
storage: “100m”
replicas: “1”
esJavaOpts: “-Xmx256m -Xms256m”
clusterHealthCheckParams: “wait_for_status=yellow&timeout=1s”

[Run]
kubectl get pods

NAME READY STATUS RESTARTS AGE
elasticsearch-master-0 0/1 Pending 0 23m
zeebe-zeebe-0 0/1 ContainerCreating 0 23m

[Run]
kubectl describe pod zeebe-zeebe-0

Name: zeebe-zeebe-0
Namespace: default
Priority: 0
Node: docker-desktop/192.168.65.3
Start Time: Fri, 24 Jan 2020 22:40:53 +0000
Labels: app=zeebe-zeebe
app.kubernetes.io/instance=zeebe
app.kubernetes.io/name=zeebe-cluster
controller-revision-hash=zeebe-zeebe-7f9f9b7fd8
statefulset.kubernetes.io/pod-name=zeebe-zeebe-0
Annotations:
Status: Pending
IP:
Controlled By: StatefulSet/zeebe-zeebe
Containers:
zeebe-cluster:
Container ID:
Image: camunda/zeebe:0.22.1
Image ID:
Ports: 9600/TCP, 26500/TCP, 26501/TCP, 26502/TCP
Host Ports: 0/TCP, 0/TCP, 0/TCP, 0/TCP
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Limits:
cpu: 1
memory: 1024m
Requests:
cpu: 100m
memory: 512m
Readiness: http-get http://:9600/ready delay=0s timeout=1s period=10s #success=1 #failure=3
Environment:
ZEEBE_LOG_LEVEL: debug
ZEEBE_PARTITIONS_COUNT: 1
ZEEBE_CLUSTER_SIZE: 1
ZEEBE_REPLICATION_FACTOR: 1
JAVA_TOOL_OPTIONS: -XX:+UseParallelGC
-XX:MinHeapFreeRatio=5
-XX:MaxHeapFreeRatio=10
-XX:MaxRAMPercentage=25.0
-XX:GCTimeRatio=4
-XX:AdaptiveSizePolicyWeight=90
-XX:+PrintFlagsFinal
-Xmx256m
-Xms265m
-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=/usr/local/zeebe/data
-XX:ErrorFile=/usr/local/zeebe/data/zeebe_error%p.log

Mounts:
  /usr/local/bin/startup.sh from config (rw,path="startup.sh")
  /usr/local/zeebe/conf/zeebe.cfg.toml from config (rw,path="zeebe.cfg.toml")
  /usr/local/zeebe/data from data (rw)
  /var/run/secrets/kubernetes.io/serviceaccount from default-token-8gwmn (ro)

Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
data:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: data-zeebe-zeebe-0
config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: zeebe-zeebe
Optional: false
default-token-8gwmn:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-8gwmn
Optional: false
QoS Class: Burstable
Node-Selectors:
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message


Warning FailedScheduling 12m (x2 over 12m) default-scheduler pod has unbound immediate PersistentVolumeClaims
Normal Scheduled 12m default-scheduler Successfully assigned default/zeebe-zeebe-0 to docker-desktop
Warning FailedCreatePodSandBox 12m (x13 over 12m) kubelet, docker-desktop Failed create pod sandbox: rpc error: code = Unknown desc = failed to start sandbox container for pod “zeebe-zeebe-0”: Error response from daemon: OCI runtime create failed: container_linux.go:346: starting container process caused “process_linux.go:319: getting the final child’s pid from pipe caused “read init-p: connection reset by peer””: unknown
Normal SandboxChanged 2m13s (x559 over 12m) kubelet, docker-desktop Pod sandbox changed, it will be killed and re-created.

[Run]
docker version

Client: Docker Engine - Community
Version: 19.03.5
API version: 1.40
Go version: go1.12.12
Git commit: 633a0ea
Built: Wed Nov 13 07:22:37 2019
OS/Arch: windows/amd64
Experimental: false

Server: Docker Engine - Community
Engine:
Version: 19.03.5
API version: 1.40 (minimum version 1.12)
Go version: go1.12.12
Git commit: 633a0ea
Built: Wed Nov 13 07:29:19 2019
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: v1.2.10
GitCommit: b34a5c8af56e510852c35414db4c1f4fa6172339
runc:
Version: 1.0.0-rc8+dev
GitCommit: 3e425f80a8c931f88e6d94a8c831b9d5aa481657
docker-init:
Version: 0.18.0
GitCommit: fec3683
Kubernetes:
Version: v1.15.5
StackAPI: v1beta2

This could be your problem. When I deploy on a Cloud Provider I need to create the PV.

@salaboy knows more aboht the K8s stuff.

@andyh hi there… I can see that you’re changing a lot of the provided parameters, that might mean that your CPU allowance for Zeebe is too low. You are currently setting a 0.1 CPU for the Zeebe broker, which is very very low, which means that while starting the pod, the scheduler will give the CPU to any other process that requires some CPU and then back to the broker. Also, you are pushing down memory quite low and Zeebe to bootstrap will require more than that for sure.

From the Zeebe side, there shouldn’t be any restrictions on the size of the cluster, but I am sure that you will need to fine-tune ElasticSearch quite heavily to make it run with a single node (as their chart is for production, the lowest supported configuration involves 3 nodes as far as I am aware).

When running in Docker For Mac Kubernetes you might want to search specific values.yaml files in each project. I will appreciate if you create an issue in http://github.com/zeebe-io/zeebe-cluster-helm requesting a specific values file for docker-for-mac, something that is provided for the ElasticSearch Helm chart: https://github.com/elastic/helm-charts/blob/master/elasticsearch/examples/docker-for-mac/values.yaml

Notice how they tune Elastic to support running all the pods in the same Kubernetes Node as by design for a production workload you should never do.

HTH

1 Like

Hi @salaboy
The reason for changing the parameters (local-values) was because I was getting “Insufficient CPU” and “Insufficient Memory” warnings deploying the original config. After a bit of digging around the ElasticSearch Helm repo I also discovered they had “values.yaml” files for local clusters like KIND and MiniKube, so I decided to create a values file which could be used to override the Zeebe and ElasticSearch parameters. Once I had change the values the cpu and memory warnings disappeared.

Are you saying that the values I have provided are too low to run Zeebe and ElasticSearch?

I have since updated my local values file:

    resources:
  requests:
    cpu: "1000m"
    memory: "512m"
  limits:
    cpu: "2000m"
    memory: "1024m"
pvcSize: "4096m"
pvcStorageClassName: "zeebe-storage"
JavaOpts: |
  -XX:+UseParallelGC 
  -XX:MinHeapFreeRatio=5
  -XX:MaxHeapFreeRatio=10
  -XX:MaxRAMPercentage=25.0 
  -XX:GCTimeRatio=4 
  -XX:AdaptiveSizePolicyWeight=90
  -XX:+PrintFlagsFinal
  -Xmx256m
  -Xms265m
  -XX:+HeapDumpOnOutOfMemoryError
  -XX:HeapDumpPath=/usr/local/zeebe/data
  -XX:ErrorFile=/usr/local/zeebe/data/zeebe_error%p.log
clusterSize: "1"
partitionCount: "1"
replicationFactor: "1"
cpuThreadCount: "1"
ioThreadCount: "1"
elasticsearch:
  antiAffinity: "soft"
  imageTag: 6.8.3
  resources:
   requests:
     cpu: "1000m"
     memory: "512m"
   limits:
     cpu: "2000m"
     memory: "1024m"
  volumeClaimTemplate:
   accessModes: [ "ReadWriteOnce" ]
   storageClassName: "es-storage"
   resources:
     requests:
       storage: "4096m"
  replicas: "1"
  esJavaOpts: "-Xmx256m -Xms256m"
  clusterHealthCheckParams: "wait_for_status=yellow&timeout=1s"

I have also provided a PV (and PVC) which points to a directory on the node (happy to share the files if required), that seems to have fixed the Persistent Volume error I was seeing, but I’m now seeing a different error:

[RUN]
kubectl describe pod zeebe-zeebe-0

[OUTPUT]

Name:           zeebe-zeebe-0
Namespace:      default
Priority:       0
Node:           docker-desktop/192.168.65.3
Start Time:     Tue, 28 Jan 2020 12:34:26 +0000
Labels:         app=zeebe-zeebe
                app.kubernetes.io/instance=zeebe
                app.kubernetes.io/name=zeebe-cluster
                controller-revision-hash=zeebe-zeebe-7b7fb49598
                statefulset.kubernetes.io/pod-name=zeebe-zeebe-0
Annotations:    <none>
Status:         Pending
IP:
Controlled By:  StatefulSet/zeebe-zeebe
Containers:
  zeebe-cluster:
    Container ID:
    Image:          camunda/zeebe:0.22.1
    Image ID:
    Ports:          9600/TCP, 26500/TCP, 26501/TCP, 26502/TCP
    Host Ports:     0/TCP, 0/TCP, 0/TCP, 0/TCP
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Limits:
      cpu:     2
      memory:  1024m
    Requests:
      cpu:      1
      memory:   512m
    Readiness:  http-get http://:9600/ready delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:
      ZEEBE_LOG_LEVEL:           debug
      ZEEBE_PARTITIONS_COUNT:    1
      ZEEBE_CLUSTER_SIZE:        1
      ZEEBE_REPLICATION_FACTOR:  1
      JAVA_TOOL_OPTIONS:         -XX:+UseParallelGC
                                 -XX:MinHeapFreeRatio=5
                                 -XX:MaxHeapFreeRatio=10
                                 -XX:MaxRAMPercentage=25.0
                                 -XX:GCTimeRatio=4
                                 -XX:AdaptiveSizePolicyWeight=90
                                 -XX:+PrintFlagsFinal
                                 -Xmx256m
                                 -Xms265m
                                 -XX:+HeapDumpOnOutOfMemoryError
                                 -XX:HeapDumpPath=/usr/local/zeebe/data
                                 -XX:ErrorFile=/usr/local/zeebe/data/zeebe_error%p.log

    Mounts:
      /usr/local/bin/startup.sh from config (rw,path="startup.sh")
      /usr/local/zeebe/conf/zeebe.cfg.toml from config (rw,path="zeebe.cfg.toml")
      /usr/local/zeebe/data from data (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-8gwmn (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  data-zeebe-zeebe-0
    ReadOnly:   false
  config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      zeebe-zeebe
    Optional:  false
  default-token-8gwmn:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-8gwmn
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason                  Age                 From                     Message
  ----     ------                  ----                ----                     -------
  Normal   Scheduled               34s                 default-scheduler        Successfully assigned default/zeebe-zeebe-0 to docker-desktop
  Normal   SandboxChanged          21s (x12 over 32s)  kubelet, docker-desktop  Pod sandbox changed, it will be killed and re-created.
  Warning  FailedCreatePodSandBox  20s (x13 over 33s)  kubelet, docker-desktop  Failed create pod sandbox: rpc error: code = Unknown desc = failed to start sandbox container for pod "zeebe-zeebe-0": Error response from daemon: OCI runtime create failed: container_linux.go:346: starting container process caused "process_linux.go:319: getting the final child's pid from pipe caused \"read init-p: connection reset by peer\"": unknown

Also, I’m more than happy to create an issue requesting for a specific values file for Docker-For-Mac, but I’m using Docker-For-Windows. That said is there likely to be a difference?

Thanks for your help with this,
Andy

1 Like

@andyh Thanks a lot for sharing that… and yes you are doing the right thing… basically customizing the ElasticSearch deployment as their suggestion for Docker for Mac. I am expecting Docker for Mac and Docker for Windows to have the same architecture (one single node cluster) so the same file should work. If you are already running it (and it works) in Docker for Windows, and you can share that file with your customisations that will be highly appreciated… for that you will need an issue. If you can create that and share your values file we can make sure that our docs point to those files for different installations and other community members can benefit from your experience.

Cheers

Hi @salaboy
I have raised the following issue: https://github.com/zeebe-io/zeebe-cluster-helm/issues/22

Thanks
Andy

1 Like

@andyh that is awesome and yes… I’ve started that initiative… look at here: https://github.com/zeebe-io/zeebe-helm-profiles … if you are ok with that I will add your yaml with a brief description to that repository and then link it from the http://helm.zeebe.io website… is that ok with you?

Hi @salaboy
That’s fine with me, although as I said I did have an issue with ES pod not always starting. I think I have worked out what the problem was, which seems to be related to linux User permissions - something I am not that familiar with. I’m not entirely sure if the fix I have is the right thing to do, so hopefully someone else might see this and let us know if it is correct or not. This is the fix:

The default ES podSecurityContext values are:

podSecurityContext:
    fsGroup: 1000
    runAsUser: 1000

and if I change them to:

podSecurityContext:
    fsGroup: 1000
    runAsUser: 0

My ES pod would start running!

Here is my updated values file:

pvcSize: "4096m"
pvcStorageClassName: "zeebe-storage"
clusterSize: "1"
partitionCount: "1"
replicationFactor: "1"
cpuThreadCount: "1"
ioThreadCount: "1"
elasticsearch:
  imageTag: "7.5.2"
  antiAffinity: "soft"
  volumeClaimTemplate:
   accessModes: [ "ReadWriteOnce" ]
   storageClassName: "es-storage"
   resources:
     requests:
       storage: "1Gi"
  replicas: "1"
  podSecurityContext:
    fsGroup: 1000
    runAsUser: 0
  extraInitContainers: |
      - name: fix-permissions
        image: busybox
        command: ["sh", "-c", "chown -R 1000:1000 /usr/share/elasticsearch/data"]
        securityContext:
          privileged: true
        volumeMounts:
        - name: elasticsearch-master
          mountPath: /usr/share/elasticsearch/data
      - name: increase-fd-ulimit
        image: busybox
        command: ["sh", "-c", "ulimit -n 65536"]
        securityContext:
          privileged: true
  clusterHealthCheckParams: "wait_for_status=yellow&timeout=1s"
1 Like

@andyh yeah… that makes a lot of sense… I will refactor that repository to have different options for different sub-charts. We definitely need to have pipelines to check that those values are actually working and don’t break while we update the charts.