karpenter-provider-aws: Nodes never get provisioned for my workload
Hello,
I am noticing following error in karpenter logs
2023-11-28T04:21:33.262Z DEBUG controller.machine.lifecycle terminating due to registration ttl {"commit": "322822a", "machine": "first-karpenter-provisioner-cn-northwest-1-xxxx", "provisioner": "first-karpenter-provisioner-cn-northwest-1", "ttl": "15m0s"} 2023-11-28T04:21:33.664Z INFO controller.machine.termination deleted machine {"commit": "322822a", "machine": "first-karpenter-provisioner-cn-northwest-1-xxxx", "provisioner": "first-karpenter-provisioner-cn-northwest-1", "node": "", "provider-id": "aws:///cn-northwest-1a/i-0030xxxxx"}
It never launches a ec2 instance for my workloads.
If I do kubectl describe machine first-karpenter-provisioner-cn-northwest-1-xxxx
I see
` `` Conditions:
Last Transition Time: 2023-11-28T04:52:02Z
Message: Node not registered with cluster
Reason: NodeNotFound
Status: False
Type: MachineInitialized
Last Transition Time: 2023-11-28T04:52:02Z
Status: True
Type: MachineLaunched
Last Transition Time: 2023-11-28T04:52:02Z
Message: Node not registered with cluster
Reason: NodeNotFound
Status: False
Type: Ready
Last Transition Time: 2023-11-28T04:52:02Z
Message: Node not registered with cluster
Reason: NodeNotFound
Status: False
Type: MachineRegistered```
About this issue
- Original URL
- State: closed
- Created 7 months ago
- Reactions: 2
- Comments: 19 (8 by maintainers)
Hi @jmdeal, I think you maybe right about my issue being what’s described in - https://karpenter.sh/docs/troubleshooting/#node-terminates-before-ready-on-failed-encrypted-ebs-volume.
As a test, I assigned AdminAccess policy to Karpenter’s IRSA. The nodes are now being successfully provisioned, without any issues.
We have EBS encryption enabled at the region level.
UPDATE: Please ignore the above comment. I discovered that the EBS encryption KMS key we are using is not the default KMS key but a different, customer managed, KMS key. Karpenter IRSA does not have access to this KMS key. Here is the error in CloudTrail (for event name - GenerateDataKeyWithoutPlaintext):
UPDATE-2: Adding the below policy statement to Karpenter’s IRSA has fixed the issue for me:
Thank you.
Yep, you can specify AMIs using AMI Selector Terms. Here’s an example with the last working 1.28 AMI:
I have similar scenarios, it was working for the past two weeks and it only started to happen on 30th Nov
Please note, the karpenter configuration was working for more than a month or two for me. We didn’t update anything in the aws eks as well as in the karpenter side. This morning after deploying a build we noticed that it cannot register any new node. Same thing happened in two different aws instances of eks in two different regions.