karpenter-provider-aws: How to debug empty nodes that don't get terminated
Version
Karpenter: 0.16.1 EKS: 1.23.9
Expected Behavior
If only daemonset pods run on a node, which was provisioned by karpenter, after ttlSecondsAfterEmpty
the node shall be terminated.
Actual Behavior
We regularly have running zombie nodes that seem to not get killed by karpenter.
I am looking for advice how to debug this:
- is there a way to show the current value of the relevant
ttlSecondsAfterEmpty
counter or if those conditions are fulfilled? The logs do not show any relevant info inkubectl -n cicd-infra-karpenter logs -l app.kubernetes.io/instance=karpenter --all-containers | grep -i ttl
- Any other debug logs I am missing?
Steps to Reproduce the Problem
- Have a node that won’t shut down
- Analyze it
Resource Specs and Logs
Daemonsets:
(not sure why, but can it be possible that if a daemonset has a nodeselector class for karpenter it affects the termination behavior, see cicd-infra-dcgm-exporter
)
fberchtold@W10-RIG:~/luminar/gitops-cicd$ kubectl get daemonsets.apps -A
NAMESPACE NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
cicd-infra-apm apm-k8s-infra-otel-agent 6 6 6 6 6 <none> 20d
cicd-infra-aws-efs-csi-driver efs-csi-node 10 10 10 10 10 beta.kubernetes.io/os=linux 36d
cicd-infra-dcgm-exporter dcgm-exporter 4 4 4 4 4 class=cicd-prod-karpenter-provisioner-linux-x86-gpu-medium 11h
cicd-infra-loki loki-promtail 6 6 6 6 6 <none> 7d9h
cicd-infra-monitoring monitoring-prometheus-node-exporter 10 10 10 10 10 <none> 36d
cicd-infra-smarter-device-manager cicd-infra-smarter-device-manager 6 6 6 6 6 <none> 5d8h
kube-system aws-node 10 10 10 10 10 <none> 22d
kube-system ebs-csi-node 10 10 10 10 10 kubernetes.io/os=linux 42d
kube-system ebs-csi-node-windows 0 0 0 0 0 kubernetes.io/os=windows 42d
kube-system kube-proxy 10 10 10 10 10 <none> 45d
The node in question:
fberchtold@W10-RIG:~/luminar/gitops-cicd$ kubectl describe node ip-10-3-11-122.us-west-2.compute.internal
Name: ip-10-3-11-122.us-west-2.compute.internal
Roles: <none>
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/instance-type=g4dn.4xlarge
beta.kubernetes.io/os=linux
class=cicd-prod-karpenter-provisioner-linux-x86-gpu-medium
failure-domain.beta.kubernetes.io/region=us-west-2
failure-domain.beta.kubernetes.io/zone=us-west-2b
k8s.io/cloud-provider-aws=ae50b0c1761b634585af5353701af259
karpenter.k8s.aws/instance-category=g
karpenter.k8s.aws/instance-cpu=16
karpenter.k8s.aws/instance-family=g4dn
karpenter.k8s.aws/instance-generation=4
karpenter.k8s.aws/instance-gpu-count=1
karpenter.k8s.aws/instance-gpu-manufacturer=nvidia
karpenter.k8s.aws/instance-gpu-memory=16384
karpenter.k8s.aws/instance-gpu-name=t4
karpenter.k8s.aws/instance-hypervisor=nitro
karpenter.k8s.aws/instance-local-nvme=225
karpenter.k8s.aws/instance-memory=65536
karpenter.k8s.aws/instance-pods=29
karpenter.k8s.aws/instance-size=4xlarge
karpenter.sh/capacity-type=on-demand
karpenter.sh/provisioner-name=cicd-prod-karpenter-provisioner-linux-x86-gpu-medium
kubernetes.io/arch=amd64
kubernetes.io/hostname=ip-10-3-11-122.us-west-2.compute.internal
kubernetes.io/os=linux
node.kubernetes.io/instance-type=g4dn.4xlarge
topology.ebs.csi.aws.com/zone=us-west-2b
topology.kubernetes.io/region=us-west-2
topology.kubernetes.io/zone=us-west-2b
Annotations: alpha.kubernetes.io/provided-node-ip: 10.3.11.122
csi.volume.kubernetes.io/nodeid: {"ebs.csi.aws.com":"i-01e8f3a596e08564a","efs.csi.aws.com":"i-01e8f3a596e08564a"}
node.alpha.kubernetes.io/ttl: 0
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Wed, 26 Oct 2022 11:03:11 -0700
Taints: environment=cicd-prod:NoSchedule
type=linux-x86-gpu-medium:NoSchedule
Unschedulable: false
Lease:
HolderIdentity: ip-10-3-11-122.us-west-2.compute.internal
AcquireTime: <unset>
RenewTime: Wed, 26 Oct 2022 17:48:04 -0700
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
Ready True Wed, 26 Oct 2022 17:43:36 -0700 Wed, 26 Oct 2022 11:05:06 -0700 KubeletReady kubelet is posting ready status
MemoryPressure False Wed, 26 Oct 2022 17:43:36 -0700 Wed, 26 Oct 2022 11:04:36 -0700 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Wed, 26 Oct 2022 17:43:36 -0700 Wed, 26 Oct 2022 16:11:47 -0700 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Wed, 26 Oct 2022 17:43:36 -0700 Wed, 26 Oct 2022 11:04:36 -0700 KubeletHasSufficientPID kubelet has sufficient PID available
Addresses:
InternalIP: 10.3.11.122
Hostname: ip-10-3-11-122.us-west-2.compute.internal
InternalDNS: ip-10-3-11-122.us-west-2.compute.internal
Capacity:
attachable-volumes-aws-ebs: 39
cpu: 16
ephemeral-storage: 20959212Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 65043764Ki
pods: 29
Allocatable:
attachable-volumes-aws-ebs: 39
cpu: 15890m
ephemeral-storage: 18242267924
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 64353588Ki
pods: 29
System Info:
Machine ID: ec2291b1f1a2c11a2328d9b1a8911e6f
System UUID: ec2291b1-f1a2-c11a-2328-d9b1a8911e6f
Boot ID: b9c9f69f-f135-480e-afbe-69b6c8aaa45a
Kernel Version: 5.4.209-116.367.amzn2.x86_64
OS Image: Amazon Linux 2
Operating System: linux
Architecture: amd64
Container Runtime Version: containerd://1.6.6
Kubelet Version: v1.23.9-eks-ba74326
Kube-Proxy Version: v1.23.9-eks-ba74326
ProviderID: aws:///us-west-2b/i-01e8f3a596e08564a
Non-terminated Pods: (6 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits Age
--------- ---- ------------ ---------- --------------- ------------- ---
cicd-infra-aws-efs-csi-driver efs-csi-node-h9wln 0 (0%) 0 (0%) 0 (0%) 0 (0%) 6h45m
cicd-infra-dcgm-exporter dcgm-exporter-vn8k6 0 (0%) 0 (0%) 0 (0%) 0 (0%) 6h45m
cicd-infra-monitoring monitoring-prometheus-node-exporter-s7lc6 0 (0%) 0 (0%) 0 (0%) 0 (0%) 6h45m
kube-system aws-node-hwdms 25m (0%) 0 (0%) 0 (0%) 0 (0%) 6h45m
kube-system ebs-csi-node-bgb95 30m (0%) 300m (1%) 120Mi (0%) 768Mi (1%) 6h45m
kube-system kube-proxy-5sldj 100m (0%) 0 (0%) 0 (0%) 0 (0%) 6h45m
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 155m (0%) 300m (1%)
memory 120Mi (0%) 768Mi (1%)
ephemeral-storage 0 (0%) 0 (0%)
hugepages-1Gi 0 (0%) 0 (0%)
hugepages-2Mi 0 (0%) 0 (0%)
attachable-volumes-aws-ebs 0 0
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal NodeHasDiskPressure 102m (x32 over 6h42m) kubelet Node ip-10-3-11-122.us-west-2.compute.internal status is now: NodeHasDiskPressure
Warning EvictionThresholdMet 101m (x60 over 6h42m) kubelet Attempting to reclaim ephemeral-storage
Normal NodeHasNoDiskPressure 96m (x35 over 6h43m) kubelet Node ip-10-3-11-122.us-west-2.compute.internal status is now: NodeHasNoDiskPressure
Community Note
- Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
- Please do not leave “+1” or “me too” comments, they generate extra noise for issue followers and do not help prioritize the request
- If you are interested in working on this issue or have submitted a pull request, please leave a comment
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Reactions: 4
- Comments: 38 (14 by maintainers)
Confirmed, removing
AWS_ENABLE_POD_ENI
solves the issue for us, node got evicted after 30s of idle time 👍See the other issue, but you either don’t enable
AWS_ENABLE_POD_ENI
or every provisioner needsvpc.amazonaws.com/has-trunk-attached: "false"
Hmm, you say this is happening on v0.16.1 or is this on latest (v0.18.1)? We should not be considering emptiness until after the node is intialized (see here). If you are able to repro this consistently, it’s probably better to track this in a separate issue.
If I manually add the label:
Almost immediately in the logs I see:
Using
describe node
I can now see thekarpenter.sh/emptiness-timestamp
on that node.So I think this is a bug, GPU Instances are not always getting the
karpenter.sh/initialized
label, causing them to never be removed.I am having the exact same issue. We have various karpenter scaling configurations, and interestingly it is only the GPU instances that get “stuck”. Not every GPU instance either, just the odd one now and again.
I notice that you also are using GPU instances… Perhaps this is significant to this issue?
I am also using EKS v1.23 and Karpenter v0.16.3 (via the helm chart of the same version).