karpenter-provider-aws: Webhook Errors on Clean Install
Version
Karpenter Version: v0.19.1
Kubernetes Version: v1.23.13
Expected Behavior
Expect Karpenter to start without error logs on a clean install.
Actual Behavior
Karpenter errors, seemingly on a race condition with the webhook controller trying to update the CA bundle.
> kubectl logs deploy/karpenter -n karpenter -f
2022-11-21T18:49:55.965Z ERROR webhook.ValidationWebhook Reconcile error {"commit": "27a51c0", "knative.dev/traceid": "18a04250-5241-42dd-a30a-cfcbf244bb4a", "knative.dev/key": "validation.webhook.karpenter.sh", "duration": "19.873µs", "error": "secret \"karpenter-cert\" is missing \"ca-cert.pem\" key"}
2022-11-21T18:49:55.965Z ERROR webhook.ValidationWebhook Reconcile error {"commit": "27a51c0", "knative.dev/traceid": "a5d1b623-be09-46e3-aff5-126cdd954644", "knative.dev/key": "karpenter/karpenter-cert", "duration": "40.708µs", "error": "secret \"karpenter-cert\" is missing \"ca-cert.pem\" key"}
2022-11-21T18:49:55.965Z ERROR webhook.ConfigMapWebhook Reconcile error {"commit": "27a51c0", "knative.dev/traceid": "00b44b85-49cd-42c4-b279-c059caa21d1a", "knative.dev/key": "karpenter/karpenter-cert", "duration": "9.842µs", "error": "secret \"karpenter-cert\" is missing \"ca-cert.pem\" key"}
2022-11-21T18:49:55.965Z ERROR webhook.ConfigMapWebhook Reconcile error {"commit": "27a51c0", "knative.dev/traceid": "2bb50735-e836-4dd0-9b0f-f766c69f9bff", "knative.dev/key": "validation.webhook.config.karpenter.sh", "duration": "21.691µs", "error": "secret \"karpenter-cert\" is missing \"ca-cert.pem\" key"}
2022-11-21T18:49:56.040Z ERROR webhook.ValidationWebhook Reconcile error {"commit": "27a51c0", "knative.dev/traceid": "cb6357f6-ee74-4439-865a-8a37bc4b3414", "knative.dev/key": "validation.webhook.karpenter.k8s.aws", "duration": "66.640235ms", "error": "failed to update webhook: Operation cannot be fulfilled on validatingwebhookconfigurations.admissionregistration.k8s.io \"validation.webhook.karpenter.k8s.aws\": the object has been modified; please apply your changes to the latest version and try again"}
2022-11-21T18:49:56.052Z INFO controller.aws.pricing updated spot pricing with instance types and offerings {"commit": "27a51c0", "instance-type-count": 561, "offering-count": 1436}
2022-11-21T18:49:56.055Z INFO controller Starting workers {"commit": "27a51c0", "controller": "provisioner-state", "controllerGroup": "karpenter.sh", "controllerKind": "Provisioner", "worker count": 10}
2022-11-21T18:49:56.056Z ERROR webhook.ConfigMapWebhook Reconcile error {"commit": "27a51c0", "knative.dev/traceid": "664c932b-0c4e-483a-99e8-3ce1c24f6670", "knative.dev/key": "validation.webhook.config.karpenter.sh", "duration": "81.512896ms", "error": "failed to update webhook: Operation cannot be fulfilled on validatingwebhookconfigurations.admissionregistration.k8s.io \"validation.webhook.config.karpenter.sh\": the object has been modified; please apply your changes to the latest version and try again"}
2022-11-21T18:49:56.060Z ERROR webhook.ValidationWebhook Reconcile error {"commit": "27a51c0", "knative.dev/traceid": "b5fb8bcc-c5ea-47e2-bea2-a93ea1eb8aee", "knative.dev/key": "validation.webhook.karpenter.sh", "duration": "84.146703ms", "error": "failed to update webhook: Operation cannot be fulfilled on validatingwebhookconfigurations.admissionregistration.k8s.io \"validation.webhook.karpenter.sh\": the object has been modified; please apply your changes to the latest version and try again"}
2022-11-21T18:49:56.060Z ERROR webhook.DefaultingWebhook Reconcile error {"commit": "27a51c0", "knative.dev/traceid": "17105fbd-855a-48ef-806b-e3157f35e09c", "knative.dev/key": "defaulting.webhook.karpenter.sh", "duration": "82.45796ms", "error": "failed to update webhook: Operation cannot be fulfilled on mutatingwebhookconfigurations.admissionregistration.k8s.io \"defaulting.webhook.karpenter.sh\": the object has been modified; please apply your changes to the latest version and try again"}
2022-11-21T18:49:56.065Z INFO controller Starting workers {"commit": "27a51c0", "controller": "node", "controllerGroup": "", "controllerKind": "Node", "worker count": 10}
2022-11-21T18:49:56.066Z INFO controller Starting workers {"commit": "27a51c0", "controller": "termination", "controllerGroup": "", "controllerKind": "Node", "worker count": 10}
2022-11-21T18:49:56.066Z INFO controller Starting workers {"commit": "27a51c0", "controller": "counter", "controllerGroup": "karpenter.sh", "controllerKind": "Provisioner", "worker count": 10}
2022-11-21T18:49:56.066Z INFO controller Starting workers {"commit": "27a51c0", "controller": "provisionermetrics", "controllerGroup": "karpenter.sh", "controllerKind": "Provisioner", "worker count": 1}
2022-11-21T18:49:56.066Z INFO controller Starting workers {"commit": "27a51c0", "controller": "inflightchecks", "controllerGroup": "", "controllerKind": "Node", "worker count": 10}
2022-11-21T18:49:56.092Z ERROR webhook.DefaultingWebhook Reconcile error {"commit": "27a51c0", "knative.dev/traceid": "f6435a18-8bac-4b0e-bcff-47b4f662a930", "knative.dev/key": "defaulting.webhook.karpenter.k8s.aws", "duration": "114.811691ms", "error": "failed to update webhook: Operation cannot be fulfilled on mutatingwebhookconfigurations.admissionregistration.k8s.io \"defaulting.webhook.karpenter.k8s.aws\": the object has been modified; please apply your changes to the latest version and try again"}
2022-11-21T18:49:57.143Z INFO controller.aws.pricing updated on-demand pricing {"commit": "27a51c0", "instance-type-count": 499}
2022-11-21T18:51:17.179Z DEBUG controller.deprovisioning discovered EC2 instance types {"commit": "27a51c0", "instance-type-count": 499}
2022-11-21T18:51:17.250Z DEBUG controller.deprovisioning discovered subnets {"commit": "27a51c0", "subnets": ["subnet-02fd4171d23ef0007 (us-east-2a)", "subnet-068de41e5a1d85cfd (us-east-2b)", "subnet-0a92ff703b80768c1 (us-east-2a)", "subnet-067ac2435f80fbe02 (us-east-2b)"]}
2022-11-21T18:51:17.369Z DEBUG controller.deprovisioning discovered EC2 instance types zonal offerings for subnets {"commit": "27a51c0", "subnet-selector": "{\"alpha.eksctl.io/cluster-name\":\"eksworkshop-eksctl\"}"}
Steps to Reproduce the Problem
...
export KARPENTER_VERSION=v0.19.1
> helm upgrade --install --namespace karpenter --create-namespace \
> karpenter oci://public.ecr.aws/karpenter/karpenter \
> --version ${KARPENTER_VERSION}\
> --set serviceAccount.annotations."eks\.amazonaws\.com/role-arn"=${KARPENTER_IAM_ROLE_ARN} \
> --set settings.aws.clusterName=${CLUSTER_NAME} \
> --set settings.aws.clusterEndpoint=${CLUSTER_ENDPOINT} \
> --set settings.aws.defaultInstanceProfile=KarpenterNodeInstanceProfile-${CLUSTER_NAME} \
> --set settings.aws.interruptionQueueName=${CLUSTER_NAME} \
> --set nodeSelector.intent=control-apps \
> --wait
Release "karpenter" does not exist. Installing it now.
NAME: karpenter
LAST DEPLOYED: Mon Nov 21 18:49:51 2022
NAMESPACE: karpenter
STATUS: deployed
REVISION: 1
TEST SUITE: None
Resource Specs and Logs
See above for Actual Behavior
Community Note
- Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
- Please do not leave “+1” or “me too” comments, they generate extra noise for issue followers and do not help prioritize the request
- If you are interested in working on this issue or have submitted a pull request, please leave a comment
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Reactions: 48
- Comments: 30 (21 by maintainers)
This is due to a known bug in the knative certificate reconciliation. We’re moving towards deprecating these webhooks in a future release. If it’s not blocking your operations, you can safely ignore them for now.
For folks concerned about this error, know that it’s just noise unless it happens continuously without going away.
Ideally, we’d prevent it from happening in the first place, but this requires changes upstream to knative/pkg.
This should be closed and fixed with v0.33.0, since the webhooks will be disabled by default.
Happens on upgrading karpenter from 0.16.3 to 0.20.0 as well. Is there any fix for the issue?
2022-12-09T11:37:32.796Z ERROR webhook.ConfigMapWebhook Reconcile error {"commit": "683d4b0", "knative.dev/traceid": "55e2e637-4c5e-4da8-87e2-3df075627951", "knative.dev/key": "validation.webhook.config.karpenter.sh", "duration": "55.431539ms", "error": "failed to update webhook: Operation cannot be fulfilled on validatingwebhookconfigurations.admissionregistration.k8s.io \"validation.webhook.config.karpenter.sh\": the object has been modified; please apply your changes to the latest version and try again"} 2022-12-09T11:37:32.803Z ERROR webhook.ValidationWebhook Reconcile error {"commit": "683d4b0", "knative.dev/traceid": "c0df4868-8646-46a2-9d8d-1a4e5f5c4944", "knative.dev/key": "validation.webhook.karpenter.k8s.aws", "duration": "62.174113ms", "error": "failed to update webhook: Operation cannot be fulfilled on validatingwebhookconfigurations.admissionregistration.k8s.io \"validation.webhook.karpenter.k8s.aws\": the object has been modified; please apply your changes to the latest version and try again"} 2022-12-09T11:37:32.816Z INFO controller.aws.pricing updated spot pricing with instance types and offerings {"commit": "683d4b0", "instance-type-count": 562, "offering-count": 1680} 2022-12-09T11:37:32.835Z ERROR webhook.ValidationWebhook Reconcile error {"commit": "683d4b0", "knative.dev/traceid": "77892349-fd32-4fb8-88f7-f37a32980014", "knative.dev/key": "validation.webhook.karpenter.sh", "duration": "94.138809ms", "error": "failed to update webhook: Operation cannot be fulfilled on validatingwebhookconfigurations.admissionregistration.k8s.io \"validation.webhook.karpenter.sh\": the object has been modified; please apply your changes to the latest version and try again"} 2022-12-09T11:37:32.835Z ERROR webhook.ValidationWebhook Reconcile error {"commit": "683d4b0", "knative.dev/traceid": "17b4d468-f8cf-4009-9fc0-e9ffd8bceb0a", "knative.dev/key": "karpenter/karpenter-cert", "duration": "91.610995ms", "error": "failed to update webhook: Operation cannot be fulfilled on validatingwebhookconfigurations.admissionregistration.k8s.io \"validation.webhook.karpenter.k8s.aws\": the object has been modified; please apply your changes to the latest version and try again"} 2022-12-09T11:37:32.835Z ERROR webhook.DefaultingWebhook Reconcile error {"commit": "683d4b0", "knative.dev/traceid": "190d534f-c21c-44ac-b241-bc2250ebc841", "knative.dev/key": "defaulting.webhook.karpenter.k8s.aws", "duration": "92.665285ms", "error": "failed to update webhook: Operation cannot be fulfilled on mutatingwebhookconfigurations.admissionregistration.k8s.io \"defaulting.webhook.karpenter.k8s.aws\": the object has been modified; please apply your changes to the latest version and try again"} 2022-12-09T11:37:32.836Z ERROR webhook.DefaultingWebhook Reconcile error {"commit": "683d4b0", "knative.dev/traceid": "8a823689-fd28-4942-baac-432d2eab067f", "knative.dev/key": "defaulting.webhook.karpenter.sh", "duration": "92.806907ms", "error": "failed to update webhook: Operation cannot be fulfilled on mutatingwebhookconfigurations.admissionregistration.k8s.io \"defaulting.webhook.karpenter.sh\": the object has been modified; please apply your changes to the latest version and try again"} 2022-12-09T11:37:32.851Z ERROR webhook.DefaultingWebhook Reconcile error {"commit": "683d4b0", "knative.dev/traceid": "3cf2997a-a103-4b91-924b-36e884779475", "knative.dev/key": "karpenter/karpenter-cert", "duration": "59.566759ms", "error": "failed to update webhook: Operation cannot be fulfilled on mutatingwebhookconfigurations.admissionregistration.k8s.io \"defaulting.webhook.karpenter.k8s.aws\": the object has been modified; please apply your changes to the latest version and try again"} 2022-12-09T11:37:41.824Z INFO controller.aws.pricing updated on-demand pricing {"commit": "683d4b0", "instance-type-count": 595} I1209 11:37:49.507135 1 leaderelection.go:258] successfully acquired lease karpenter/karpenter-leader-election
This is happening to me with version 0.30.0 in EKS, clean installs, multiple clusters having the same problem. I installed via the helm chart. Only thing I did that’s a little unusual is that the helm chart is installed via ArgoCD.
The problem’s been happening about a week, and after multiple restarts, so whatever’s supposed to be self-healing, isn’t in my case.
I saw this in one of my clusters too… it’s just a noise atm
I didn’t wait long enough. I restarted both pods and they went instantly OK
Happened with me on clean install of karpenter v0.20.0 that has been deployed using v4.18.1 of https://github.com/aws-ia/terraform-aws-eks-blueprints/releases/tag/v4.18.1 This repo has "examples/karpenter` that can be used to create a new eks cluster with karpenter.
These logs are from two pods. Note that errors are only in logs of one pod. Other pod’s logs don’t have these errors. Secondly, these errors were only seen on 2022-12-16 (When I created fresh cluster with karpenter). I am not noticing these errors now.
This comment is copied from my message at https://kubernetes.slack.com/archives/C02SFFZSA2K/p1671437472589809
Did the controller not work after these errors? They should just be transient errors that self heal, since both controllers are trying to reconcile the same webhook.
upgraded from v0.20 to v0.21 and enabled
DriftEnabled
. got this errorrestarting both pods seems to fix it
Reopening this as the issue is still here and lies with knative.