karpenter-provider-aws: karpenter.sh/initialized: true does not get applied to any Node
Version
Karpenter Version: v0.18.1
Kubernetes Version: v1.23 (AWS EKS)
Expected Behavior
Karpenter should bring nodes up as requested and when initialized should label the instance with karpenter.sh/initialized: true
On version v0.18.1 the label karpenter.sh/initialized: true
is not set on ANY new instance.
This leads to such behaviour problems as deleting and consolidation not working as expected, in this case not removing or consolidating any nodes.
This has a huge cost hit.
Note this nodes have been used correctly by pods.
Actual Behavior
New nodes are correctly bought up and when ready should be labeled with karpenter.sh/initialized: true
.
This then allows the behaviours to work as expected such as deletion and consolidation.
Steps to Reproduce the Problem
Bring up any node with version v0.18.1 and look at the labels you will see they are not labeled correctly.
To see the nodes are not labeled correctly we ran kubectl get node -L karpenter.sh/initialized
see below for command output.
Resource Specs and Logs
kubectl get node -L karpenter.sh/initialized
NAME STATUS ROLES AGE VERSION INITIALIZED
ip-xxx.ec2.internal Ready <none> 36m v1.23.9-eks-ba74326
ip-xxx.ec2.internal Ready <none> 37m v1.23.9-eks-ba74326
ip-xxx.ec2.internal Ready <none> 37m v1.23.9-eks-ba74326
ip-xxx.ec2.internal Ready <none> 14m v1.23.9-eks-ba74326
ip-xxx.ec2.internal Ready <none> 3h14m v1.23.9-eks-ba74326
ip-xxx.ec2.internal Ready <none> 14m v1.23.9-eks-ba74326
ip-xxx.ec2.internal Ready <none> 37m v1.23.9-eks-ba74326
ip-xxx.ec2.internal Ready <none> 14m v1.23.9-eks-ba74326
ip-xxx.ec2.internal Ready <none> 37m v1.23.9-eks-ba74326
ip-xxx.ec2.internal Ready <none> 14m v1.23.9-eks-ba74326
ip-xxx.ec2.internal Ready <none> 3h14m v1.23.9-eks-ba74326
ip-xxx.ec2.internal Ready <none> 37m v1.23.9-eks-ba74326
ip-xxx.ec2.internal Ready <none> 37m v1.23.9-eks-ba74326
ip-xxx.ec2.internal Ready <none> 37m v1.23.9-eks-ba74326
ip-xxx.ec2.internal Ready <none> 4m26s v1.23.9-eks-ba74326
ip-xxx.ec2.internal Ready <none> 37m v1.23.9-eks-ba74326
ip-xxx.ec2.internal Ready <none> 37m v1.23.9-eks-ba74326
ip-xxx.ec2.internal Ready <none> 37m v1.23.9-eks-ba74326
ip-xxx.ec2.internal Ready <none> 3h18m v1.23.9-eks-ba74326
ip-xxx.ec2.internal Ready <none> 14m v1.23.9-eks-ba74326
ip-xxx.ec2.internal Ready <none> 19m v1.23.9-eks-ba74326
ip-xxx.ec2.internal Ready <none> 4m27s v1.23.9-eks-ba74326
Community Note
- Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
- Please do not leave “+1” or “me too” comments, they generate extra noise for issue followers and do not help prioritize the request
- If you are interested in working on this issue or have submitted a pull request, please leave a comment
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Reactions: 7
- Comments: 24 (15 by maintainers)
This is normally caused by extended resources not registering or startup taints not being removed.
Are you using extended resources (e.g. GPUs or have AWS_ENABLE_POD_ENI turned on?) or startup taints?
Closing since #3408 was merged. This should be fixed in the next minor version release (v0.28.0)
Got it. I think the only options at this point are to either exclude the
inf
types from your provisioner or run the DS that registers theneuron
resource. As mentioned by @ellistarn, this model where we do initialization based on expected resources should change to requested resources in #3408 so when that PR is merged and released, you should be able to useinf
types without the DS.@jonathan-innis was thinking about changing initialization to only require the resources requested by pods (e.g. if pods didn’t request the resources, we wouldn’t include it in initialization). Reopening for his comment.
Is it possible for you to just scale down the resources of the daemonset to 0?