Issue Description: A Kubernetes pod deployment failed with a status of Pending
due to insufficient CPU resources available on the node. The deployment was attempted on a Minikube cluster, specifically for a ClickHouse instance managed by the Altinity Kubernetes Operator.
Symptoms:
- The pod
chi-demo-01-demo-01-0-0-0
remained in thePending
state. - The Kubernetes scheduler emitted warnings indicating “Insufficient cpu”.
- The node had limited CPU resources (only 2 CPU cores available).
Environment:
- Minikube single-node Kubernetes cluster
- ClickHouse Altinity Kubernetes Operator
- Ubuntu operating system on the host machine
Troubleshooting Steps:
Identify the Problematic Pod: Run the command to list all resources in the problematic namespace:
NAME READY STATUS RESTARTS AGE pod/chi-demo-01-demo-01-0-0-0 0/1 Pending 0 55s NAME READY AGE statefulset.apps/chi-demo-01-demo-01-0-0 0/1 55s
This shows the ClickHouse instance pod chi-demo-01-demo-01-0-0-0
with a status of Pending
.
Inspect the Pod Details: Detailed inspection of the pod:
ubuntu@inception:~/minikube-clickhouse-altinity-k8-operator$ k describe pod chi-demo-01-demo-01-0-0-0 -n test Name: chi-demo-01-demo-01-0-0-0 Namespace: test Priority: 0 Service Account: default Node: <none> Labels: apps.kubernetes.io/pod-index=0 clickhouse.altinity.com/app=chop clickhouse.altinity.com/chi=demo-01 clickhouse.altinity.com/cluster=demo-01 clickhouse.altinity.com/namespace=test clickhouse.altinity.com/ready=no clickhouse.altinity.com/replica=0 clickhouse.altinity.com/shard=0 controller-revision-hash=chi-demo-01-demo-01-0-0-67b94b6546 statefulset.kubernetes.io/pod-name=chi-demo-01-demo-01-0-0-0 Annotations: <none> Status: Pending IP: IPs: <none> Controlled By: StatefulSet/chi-demo-01-demo-01-0-0 Containers: clickhouse: Image: altinity/clickhouse-server:21.8.10.1.altinitystable Ports: 9000/TCP, 8123/TCP, 9009/TCP Host Ports: 0/TCP, 0/TCP, 0/TCP Limits: cpu: 700m memory: 712Mi Requests: cpu: 500m memory: 512Mi Liveness: http-get http://:http/ping delay=60s timeout=1s period=3s #success=1 #failure=10 Readiness: http-get http://:http/ping delay=10s timeout=1s period=3s #success=1 #failure=3 Environment: <none> Mounts: /etc/clickhouse-server/conf.d/ from chi-demo-01-deploy-confd-demo-01-0-0 (rw) /etc/clickhouse-server/config.d/ from chi-demo-01-common-configd (rw) /etc/clickhouse-server/users.d/ from chi-demo-01-common-usersd (rw) /var/lib/clickhouse from storage-vc-template (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-dvnvn (ro) Conditions: Type Status PodScheduled False Volumes: storage-vc-template: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) ClaimName: storage-vc-template-chi-demo-01-demo-01-0-0-0 ReadOnly: false chi-demo-01-common-configd: Type: ConfigMap (a volume populated by a ConfigMap) Name: chi-demo-01-common-configd Optional: false chi-demo-01-common-usersd: Type: ConfigMap (a volume populated by a ConfigMap) Name: chi-demo-01-common-usersd Optional: false chi-demo-01-deploy-confd-demo-01-0-0: Type: ConfigMap (a volume populated by a ConfigMap) Name: chi-demo-01-deploy-confd-demo-01-0-0 Optional: false kube-api-access-dvnvn: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: <nil> DownwardAPI: true QoS Class: Burstable Node-Selectors: <none> Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 62s (x2 over 63s) default-scheduler 0/1 nodes are available: 1 Insufficient cpu. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod.. ubuntu@inception:~/minikube-clickhouse-altinity-k8-operator$
This confirmed the pod was pending due to insufficient CPU resources.
Check Node Resources: Examine the available resources on the node:
ubuntu@inception:~/minikube-clickhouse-altinity-k8-operator$ minikube ssh docker@minikube:~$ docker@minikube:~$ lscpu | grep '^CPU(s):' CPU(s): 2 docker@minikube:~$ ubuntu@inception:~/minikube-clickhouse-altinity-k8-operator$ kubectl describe nodes Name: minikube Roles: control-plane Labels: beta.kubernetes.io/arch=amd64 beta.kubernetes.io/os=linux kubernetes.io/arch=amd64 kubernetes.io/hostname=minikube kubernetes.io/os=linux minikube.k8s.io/commit=8220a6eb95f0a4d75f7f2d7b14cef975f050512d minikube.k8s.io/name=minikube minikube.k8s.io/primary=true minikube.k8s.io/updated_at=2024_03_13T04_56_43_0700 minikube.k8s.io/version=v1.32.0 node-role.kubernetes.io/control-plane= node.kubernetes.io/exclude-from-external-load-balancers= Annotations: kubeadm.alpha.kubernetes.io/cri-socket: unix:///var/run/cri-dockerd.sock node.alpha.kubernetes.io/ttl: 0 projectcalico.org/IPv4Address: 192.168.49.2/24 projectcalico.org/IPv4IPIPTunnelAddr: 10.244.120.64 volumes.kubernetes.io/controller-managed-attach-detach: true CreationTimestamp: Wed, 13 Mar 2024 04:56:37 +0000 Taints: <none> Unschedulable: false Lease: HolderIdentity: minikube AcquireTime: <unset> RenewTime: Fri, 15 Mar 2024 05:39:04 +0000 Conditions: Type Status LastHeartbeatTime LastTransitionTime Reason Message ---- ------ ----------------- ------------------ ------ ------- NetworkUnavailable False Wed, 13 Mar 2024 04:57:19 +0000 Wed, 13 Mar 2024 04:57:19 +0000 CalicoIsUp Calico is running on this node MemoryPressure False Fri, 15 Mar 2024 05:37:18 +0000 Wed, 13 Mar 2024 04:56:33 +0000 KubeletHasSufficientMemory kubelet has sufficient memory available DiskPressure False Fri, 15 Mar 2024 05:37:18 +0000 Wed, 13 Mar 2024 04:56:33 +0000 KubeletHasNoDiskPressure kubelet has no disk pressure PIDPressure False Fri, 15 Mar 2024 05:37:18 +0000 Wed, 13 Mar 2024 04:56:33 +0000 KubeletHasSufficientPID kubelet has sufficient PID available Ready True Fri, 15 Mar 2024 05:37:18 +0000 Wed, 13 Mar 2024 04:56:41 +0000 KubeletReady kubelet is posting ready status Addresses: InternalIP: 192.168.49.2 Hostname: minikube Capacity: cpu: 2 ephemeral-storage: 304681132Ki hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 16110852Ki pods: 110 Allocatable: cpu: 2 ephemeral-storage: 304681132Ki hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 16110852Ki pods: 110 System Info: Machine ID: 26f167db931a415897be391a3fe91314 System UUID: 97f143b5-e088-47e7-8783-1b9926b26d70 Boot ID: 4e803b4a-98ad-46ed-ac50-7dfcf8c4b58b Kernel Version: 6.5.0-1014-aws OS Image: Ubuntu 22.04.3 LTS Operating System: linux Architecture: amd64 Container Runtime Version: docker://24.0.7 Kubelet Version: v1.28.3 Kube-Proxy Version: v1.28.3 PodCIDR: 10.244.0.0/24 PodCIDRs: 10.244.0.0/24 Non-terminated Pods: (12 in total) Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits Age --------- ---- ------------ ---------- --------------- ------------- --- chista-zookeeper zookeeper-0 0 (0%) 0 (0%) 0 (0%) 0 (0%) 47h kube-system calico-kube-controllers-558d465845-nl69h 0 (0%) 0 (0%) 0 (0%) 0 (0%) 2d kube-system calico-node-87x6w 250m (12%) 0 (0%) 0 (0%) 0 (0%) 2d kube-system clickhouse-operator-747546f854-brj2d 0 (0%) 0 (0%) 0 (0%) 0 (0%) 26h kube-system coredns-5dd5756b68-7h6ns 100m (5%) 0 (0%) 70Mi (0%) 170Mi (1%) 2d kube-system etcd-minikube 100m (5%) 0 (0%) 100Mi (0%) 0 (0%) 2d kube-system kube-apiserver-minikube 250m (12%) 0 (0%) 0 (0%) 0 (0%) 2d kube-system kube-controller-manager-minikube 200m (10%) 0 (0%) 0 (0%) 0 (0%) 2d kube-system kube-proxy-gql2q 0 (0%) 0 (0%) 0 (0%) 0 (0%) 2d kube-system kube-scheduler-minikube 100m (5%) 0 (0%) 0 (0%) 0 (0%) 2d kube-system storage-provisioner 0 (0%) 0 (0%) 0 (0%) 0 (0%) 2d zoo1ns zookeeper-0 1 (50%) 2 (100%) 512M (3%) 4Gi (26%) 23h Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) Resource Requests Limits -------- -------- ------ cpu 2 (100%) 2 (100%) memory 674080Ki (4%) 4266Mi (27%) ephemeral-storage 0 (0%) 0 (0%) hugepages-1Gi 0 (0%) 0 (0%) hugepages-2Mi 0 (0%) 0 (0%) Events: <none> After further analysis it has been found , there was zookeeper instance running and it was not use by any service. So to resume my further pod deployment I have temp stop zookpeer runing instanes by scaling down using below command.
It was observed that the node had only 2 CPU cores, and resource requests by other pods matched or exceeded this capacity.
Identify Non-Critical Workloads: Investigate other workloads on the node:
ubuntu@inception:~/minikube-clickhouse-altinity-k8-operator$ k get all -n zoo1ns NAME READY STATUS RESTARTS AGE pod/zookeeper-0 1/1 Running 0 23h NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/zookeeper ClusterIP 10.108.123.163 <none> 2181/TCP,7000/TCP 23h service/zookeepers ClusterIP None <none> 2888/TCP,3888/TCP 23h NAME READY AGE statefulset.apps/zookeeper 1/1 23h
A ZooKeeper instance, not used by any current service, was identified as a potentially reclaimable resource.
Scale Down Non-Essential Services: Temporarily scale down the unnecessary ZooKeeper service to free up resources:
ubuntu@inception:~/minikube-clickhouse-altinity-k8-operator$ kubectl scale statefulset zookeeper --replicas=0 -n zoo1ns statefulset.apps/zookeeper scaled ubuntu@inception:~/minikube-clickhouse-altinity-k8-operator$ ubuntu@inception:~/minikube-clickhouse-altinity-k8-operator$ k get all -n zoo1ns NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/zookeeper ClusterIP 10.108.123.163 <none> 2181/TCP,7000/TCP 23h service/zookeepers ClusterIP None <none> 2888/TCP,3888/TCP 23h NAME READY AGE statefulset.apps/zookeeper 0/0 23h
This action reduced the CPU and memory resources reserved on the node.
Verify Resource Availability and Pod Status: After scaling down the ZooKeeper instance, check the resource allocation and pod status again:
Node status : Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits Age --------- ---- ------------ ---------- --------------- ------------- --- chista-zookeeper zookeeper-0 0 (0%) 0 (0%) 0 (0%) 0 (0%) 47h kube-system calico-kube-controllers-558d465845-nl69h 0 (0%) 0 (0%) 0 (0%) 0 (0%) 2d kube-system calico-node-87x6w 250m (12%) 0 (0%) 0 (0%) 0 (0%) 2d kube-system clickhouse-operator-747546f854-brj2d 0 (0%) 0 (0%) 0 (0%) 0 (0%) 26h kube-system coredns-5dd5756b68-7h6ns 100m (5%) 0 (0%) 70Mi (0%) 170Mi (1%) 2d kube-system etcd-minikube 100m (5%) 0 (0%) 100Mi (0%) 0 (0%) 2d kube-system kube-apiserver-minikube 250m (12%) 0 (0%) 0 (0%) 0 (0%) 2d kube-system kube-controller-manager-minikube 200m (10%) 0 (0%) 0 (0%) 0 (0%) 2d kube-system kube-proxy-gql2q 0 (0%) 0 (0%) 0 (0%) 0 (0%) 2d kube-system kube-scheduler-minikube 100m (5%) 0 (0%) 0 (0%) 0 (0%) 2d kube-system storage-provisioner 0 (0%) 0 (0%) 0 (0%) 0 (0%) 2d test chi-demo-01-demo-01-0-0-0 500m (25%) 700m (35%) 512Mi (3%) 712Mi (4%) 7m50s Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) Resource Requests Limits -------- -------- ------ cpu 1500m (75%) 700m (35%) memory 682Mi (4%) 882Mi (5%) ephemeral-storage 0 (0%) 0 (0%) hugepages-1Gi 0 (0%) 0 (0%) hugepages-2Mi 0 (0%) 0 (0%) Events: <none> ubuntu@inception:~/minikube-clickhouse-altinity-k8-operator$ k get all -n test NAME READY STATUS RESTARTS AGE pod/chi-demo-01-demo-01-0-0-0 1/1 Running 0 9m28s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/chi-demo-01-demo-01-0-0 ClusterIP None <none> 9000/TCP,8123/TCP,9009/TCP 4m22s service/clickhouse-demo-01 LoadBalancer 10.96.136.220 <pending> 8123:31020/TCP,9000:31155/TCP 111s NAME READY AGE statefulset.apps/chi-demo-01-demo-01-0-0 1/1 9m28s ubuntu@inception:~/minikube-clickhouse-altinity-k8-operator$ k describe pod chi-demo-01-demo-01-0-0-0 -n test Event : Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 3m42s (x3 over 9m8s) default-scheduler 0/1 nodes are available: 1 Insufficient cpu. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod.. Normal Scheduled 104s default-scheduler Successfully assigned test/chi-demo-01-demo-01-0-0-0 to minikube Normal Pulled 103s kubelet Container image "altinity/clickhouse-server:21.8.10.1.altinitystable" already present on machine Normal Created 103s kubelet Created container clickhouse Normal Started 103s kubelet Started container clickhouse
The ClickHouse instance pod chi-demo-01-demo-01-0-0-0
transitioned to the Running
state, indicating successful scheduling and deployment
Enable Pod-Level Resource Monitoring (Optional): To avoid similar issues in the future, enable metrics collection for better resource monitoring:
ubuntu@inception:~/minikube-clickhouse-altinity-k8-operator$ kubectl top pods error: Metrics API not available ubuntu@inception:~/minikube-clickhouse-altinity-k8-operator$ minikube addons list |-----------------------------|----------|--------------|--------------------------------| | ADDON NAME | PROFILE | STATUS | MAINTAINER | |-----------------------------|----------|--------------|--------------------------------| | ambassador | minikube | disabled | 3rd party (Ambassador) | | auto-pause | minikube | disabled | minikube | | cloud-spanner | minikube | disabled | Google | | csi-hostpath-driver | minikube | disabled | Kubernetes | | dashboard | minikube | disabled | Kubernetes | | default-storageclass | minikube | enabled ✅ | Kubernetes | | efk | minikube | disabled | 3rd party (Elastic) | | freshpod | minikube | disabled | Google | | gcp-auth | minikube | disabled | Google | | gvisor | minikube | disabled | minikube | | headlamp | minikube | disabled | 3rd party (kinvolk.io) | | helm-tiller | minikube | disabled | 3rd party (Helm) | | inaccel | minikube | disabled | 3rd party (InAccel | | | | | [info@inaccel.com]) | | ingress | minikube | disabled | Kubernetes | | ingress-dns | minikube | disabled | minikube | | inspektor-gadget | minikube | disabled | 3rd party | | | | | (inspektor-gadget.io) | | istio | minikube | disabled | 3rd party (Istio) | | istio-provisioner | minikube | disabled | 3rd party (Istio) | | kong | minikube | disabled | 3rd party (Kong HQ) | | kubeflow | minikube | disabled | 3rd party | | kubevirt | minikube | disabled | 3rd party (KubeVirt) | | logviewer | minikube | disabled | 3rd party (unknown) | | metallb | minikube | disabled | 3rd party (MetalLB) | | metrics-server | minikube | disabled | Kubernetes | | nvidia-device-plugin | minikube | disabled | 3rd party (NVIDIA) | | nvidia-driver-installer | minikube | disabled | 3rd party (Nvidia) | | nvidia-gpu-device-plugin | minikube | disabled | 3rd party (Nvidia) | | olm | minikube | disabled | 3rd party (Operator Framework) | | pod-security-policy | minikube | disabled | 3rd party (unknown) | | portainer | minikube | disabled | 3rd party (Portainer.io) | | registry | minikube | disabled | minikube | | registry-aliases | minikube | disabled | 3rd party (unknown) | | registry-creds | minikube | disabled | 3rd party (UPMC Enterprises) | | storage-provisioner | minikube | enabled ✅ | minikube | | storage-provisioner-gluster | minikube | disabled | 3rd party (Gluster) | | storage-provisioner-rancher | minikube | disabled | 3rd party (Rancher) | | volumesnapshots | minikube | disabled | Kubernetes | |-----------------------------|----------|--------------|--------------------------------| ubuntu@inception:~/minikube-clickhouse-altinity-k8-operator$ minikube addons enable metrics-server 💡 metrics-server is an addon maintained by Kubernetes. For any concerns contact minikube on GitHub. You can view the list of minikube maintainers at: https://github.com/kubernetes/minikube/blob/master/OWNERS ▪ Using image registry.k8s.io/metrics-server/metrics-server:v0.6.4 🌟 The 'metrics-server' addon is enabled ubuntu@inception:~/minikube-clickhouse-altinity-k8-operator$ minikube addons list |-----------------------------|----------|--------------|--------------------------------| | ADDON NAME | PROFILE | STATUS | MAINTAINER | |-----------------------------|----------|--------------|--------------------------------| | ambassador | minikube | disabled | 3rd party (Ambassador) | | auto-pause | minikube | disabled | minikube | | cloud-spanner | minikube | disabled | Google | | csi-hostpath-driver | minikube | disabled | Kubernetes | | dashboard | minikube | disabled | Kubernetes | | default-storageclass | minikube | enabled ✅ | Kubernetes | | efk | minikube | disabled | 3rd party (Elastic) | | metrics-server | minikube | enabled ✅ | Kubernetes | | storage-provisioner | minikube | enabled ✅ | minikube | | storage-provisioner-gluster | minikube | disabled | 3rd party (Gluster) | | storage-provisioner-rancher | minikube | disabled | 3rd party (Rancher) | | volumesnapshots | minikube | disabled | Kubernetes | |-----------------------------|----------|--------------|--------------------------------| ubuntu@inception:~/minikube-clickhouse-altinity-k8-operator$ kubectl top pods -n test NAME CPU(cores) MEMORY(bytes) chi-demo-01-demo-01-0-0-0 15m 81Mi ubuntu@inception:~/minikube-clickhouse-altinity-k8-operator$ kubectl top nodes NAME CPU(cores) CPU% MEMORY(bytes) MEMORY% minikube 327m 16% 2180Mi 13%
This allows for real-time monitoring of pod resource usage.
Resolution: The issue was resolved by identifying and scaling down a non-critical ZooKeeper service, thereby freeing up sufficient CPU resources on the node for the ClickHouse pod. The deployment then proceeded successfully.
Preventive Measures:
- Regularly monitor resource usage and allocations on nodes.
- Prioritize critical services and scale down non-essential workloads if necessary.
- Consider adding more nodes or resources to the cluster if frequent resource shortages are encountered.
Note: The steps and commands may vary based on the specific Kubernetes setup and the nature of deployed services. Always ensure critical data and services are not impacted when modifying or scaling down deployments.