karpenter-provider-aws: Endless provisioning loop of new node
Version
Karpenter: v0.8.2
Kubernetes: v1.21.5
Context
We are using calico CNI + istio in our clusters. istio-validation is an init container injected in all our pods. When a pod is assigned to a node the first run of istio-validation always fails with the error Init:ContainerStatusUnknown
(because init container starts executing before istio-cni-node-xxx is ready) this in turn causes a pod to be rescheduled on another node.
I understand that failed init container is our bad, but there might be any other circumstances which causes pod to fail at init stage.
Actual Behavior
karpenter starts provisioning a node with the same capacity to assign this pod to it, despite there’s a lot of empty(with available capacity) nodes provisioned a few minutes ago - ttlSecondsAfterEmpty: 360
Expected Behavior
Karpenter does not start provisioning a new node and assign recreated pod to it, but let default scheduler move this pod onto other nodes with available capacity OR lets restart/recreation of a pod on the same node.
Steps to Reproduce the Problem
Any deployment/sts with init container which completes with exit code > 0
Resource Specs and Logs
No noticeable messages found in the logs
apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
annotations:
helm.sh/hook: post-install,post-upgrade
helm.sh/resource-policy: keep
generation: 1
name: default
spec:
labels:
k8s.example.com/cluster: xyz-eks
k8s.example.com/environment: production
provider:
launchTemplate: xyz-eks-karpenter20220401093308003100000003
subnetSelector:
Name: us-west-1-prod-vpc-subnet-k8s-us-west-1*
tags:
Environment: production
Name: xyz-eks-default-karpenter-provider
Team: DevOps
requirements:
- key: node.kubernetes.io/instance-type
operator: NotIn
values:
- t3.nano
- t3.xlarge
- t3.2xlarge
- t3.medium
- t3.micro
- t3.small
- t3.large
- t3a.nano
- t3a.xlarge
- t3a.2xlarge
- t3a.medium
- t3a.micro
- t3a.small
- t3a.large
- key: karpenter.sh/capacity-type
operator: In
values:
- spot
- key: topology.kubernetes.io/zone
operator: In
values:
- us-west-1b
- us-west-1c
- key: kubernetes.io/arch
operator: In
values:
- amd64
ttlSecondsAfterEmpty: 360
ttlSecondsUntilExpired: 1209600
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 20 (9 by maintainers)
Labeled for closure due to inactivity in 10 days.
@infestonn Thanks for the info. We are actively investigating not binding pods to nodes. In this case, we would just launch the node and allow kube-scheduler to bind pods after the node has become ready. It should avoid the issue that you are seeing where the pod is bound before the node is ready and initialization fails.