karpenter-provider-aws: Webhook Errors on Clean Install

Version

Karpenter Version: v0.19.1

Kubernetes Version: v1.23.13

Expected Behavior

Expect Karpenter to start without error logs on a clean install.

Actual Behavior

Karpenter errors, seemingly on a race condition with the webhook controller trying to update the CA bundle.

> kubectl logs deploy/karpenter -n karpenter -f
2022-11-21T18:49:55.965Z        ERROR   webhook.ValidationWebhook       Reconcile error {"commit": "27a51c0", "knative.dev/traceid": "18a04250-5241-42dd-a30a-cfcbf244bb4a", "knative.dev/key": "validation.webhook.karpenter.sh", "duration": "19.873µs", "error": "secret \"karpenter-cert\" is missing \"ca-cert.pem\" key"}
2022-11-21T18:49:55.965Z        ERROR   webhook.ValidationWebhook       Reconcile error {"commit": "27a51c0", "knative.dev/traceid": "a5d1b623-be09-46e3-aff5-126cdd954644", "knative.dev/key": "karpenter/karpenter-cert", "duration": "40.708µs", "error": "secret \"karpenter-cert\" is missing \"ca-cert.pem\" key"}
2022-11-21T18:49:55.965Z        ERROR   webhook.ConfigMapWebhook        Reconcile error {"commit": "27a51c0", "knative.dev/traceid": "00b44b85-49cd-42c4-b279-c059caa21d1a", "knative.dev/key": "karpenter/karpenter-cert", "duration": "9.842µs", "error": "secret \"karpenter-cert\" is missing \"ca-cert.pem\" key"}
2022-11-21T18:49:55.965Z        ERROR   webhook.ConfigMapWebhook        Reconcile error {"commit": "27a51c0", "knative.dev/traceid": "2bb50735-e836-4dd0-9b0f-f766c69f9bff", "knative.dev/key": "validation.webhook.config.karpenter.sh", "duration": "21.691µs", "error": "secret \"karpenter-cert\" is missing \"ca-cert.pem\" key"}
2022-11-21T18:49:56.040Z        ERROR   webhook.ValidationWebhook       Reconcile error {"commit": "27a51c0", "knative.dev/traceid": "cb6357f6-ee74-4439-865a-8a37bc4b3414", "knative.dev/key": "validation.webhook.karpenter.k8s.aws", "duration": "66.640235ms", "error": "failed to update webhook: Operation cannot be fulfilled on validatingwebhookconfigurations.admissionregistration.k8s.io \"validation.webhook.karpenter.k8s.aws\": the object has been modified; please apply your changes to the latest version and try again"}
2022-11-21T18:49:56.052Z        INFO    controller.aws.pricing  updated spot pricing with instance types and offerings  {"commit": "27a51c0", "instance-type-count": 561, "offering-count": 1436}
2022-11-21T18:49:56.055Z        INFO    controller      Starting workers        {"commit": "27a51c0", "controller": "provisioner-state", "controllerGroup": "karpenter.sh", "controllerKind": "Provisioner", "worker count": 10}
2022-11-21T18:49:56.056Z        ERROR   webhook.ConfigMapWebhook        Reconcile error {"commit": "27a51c0", "knative.dev/traceid": "664c932b-0c4e-483a-99e8-3ce1c24f6670", "knative.dev/key": "validation.webhook.config.karpenter.sh", "duration": "81.512896ms", "error": "failed to update webhook: Operation cannot be fulfilled on validatingwebhookconfigurations.admissionregistration.k8s.io \"validation.webhook.config.karpenter.sh\": the object has been modified; please apply your changes to the latest version and try again"}
2022-11-21T18:49:56.060Z        ERROR   webhook.ValidationWebhook       Reconcile error {"commit": "27a51c0", "knative.dev/traceid": "b5fb8bcc-c5ea-47e2-bea2-a93ea1eb8aee", "knative.dev/key": "validation.webhook.karpenter.sh", "duration": "84.146703ms", "error": "failed to update webhook: Operation cannot be fulfilled on validatingwebhookconfigurations.admissionregistration.k8s.io \"validation.webhook.karpenter.sh\": the object has been modified; please apply your changes to the latest version and try again"}
2022-11-21T18:49:56.060Z        ERROR   webhook.DefaultingWebhook       Reconcile error {"commit": "27a51c0", "knative.dev/traceid": "17105fbd-855a-48ef-806b-e3157f35e09c", "knative.dev/key": "defaulting.webhook.karpenter.sh", "duration": "82.45796ms", "error": "failed to update webhook: Operation cannot be fulfilled on mutatingwebhookconfigurations.admissionregistration.k8s.io \"defaulting.webhook.karpenter.sh\": the object has been modified; please apply your changes to the latest version and try again"}
2022-11-21T18:49:56.065Z        INFO    controller      Starting workers        {"commit": "27a51c0", "controller": "node", "controllerGroup": "", "controllerKind": "Node", "worker count": 10}
2022-11-21T18:49:56.066Z        INFO    controller      Starting workers        {"commit": "27a51c0", "controller": "termination", "controllerGroup": "", "controllerKind": "Node", "worker count": 10}
2022-11-21T18:49:56.066Z        INFO    controller      Starting workers        {"commit": "27a51c0", "controller": "counter", "controllerGroup": "karpenter.sh", "controllerKind": "Provisioner", "worker count": 10}
2022-11-21T18:49:56.066Z        INFO    controller      Starting workers        {"commit": "27a51c0", "controller": "provisionermetrics", "controllerGroup": "karpenter.sh", "controllerKind": "Provisioner", "worker count": 1}
2022-11-21T18:49:56.066Z        INFO    controller      Starting workers        {"commit": "27a51c0", "controller": "inflightchecks", "controllerGroup": "", "controllerKind": "Node", "worker count": 10}
2022-11-21T18:49:56.092Z        ERROR   webhook.DefaultingWebhook       Reconcile error {"commit": "27a51c0", "knative.dev/traceid": "f6435a18-8bac-4b0e-bcff-47b4f662a930", "knative.dev/key": "defaulting.webhook.karpenter.k8s.aws", "duration": "114.811691ms", "error": "failed to update webhook: Operation cannot be fulfilled on mutatingwebhookconfigurations.admissionregistration.k8s.io \"defaulting.webhook.karpenter.k8s.aws\": the object has been modified; please apply your changes to the latest version and try again"}
2022-11-21T18:49:57.143Z        INFO    controller.aws.pricing  updated on-demand pricing       {"commit": "27a51c0", "instance-type-count": 499}
2022-11-21T18:51:17.179Z        DEBUG   controller.deprovisioning       discovered EC2 instance types   {"commit": "27a51c0", "instance-type-count": 499}
2022-11-21T18:51:17.250Z        DEBUG   controller.deprovisioning       discovered subnets      {"commit": "27a51c0", "subnets": ["subnet-02fd4171d23ef0007 (us-east-2a)", "subnet-068de41e5a1d85cfd (us-east-2b)", "subnet-0a92ff703b80768c1 (us-east-2a)", "subnet-067ac2435f80fbe02 (us-east-2b)"]}
2022-11-21T18:51:17.369Z        DEBUG   controller.deprovisioning       discovered EC2 instance types zonal offerings for subnets       {"commit": "27a51c0", "subnet-selector": "{\"alpha.eksctl.io/cluster-name\":\"eksworkshop-eksctl\"}"}

Steps to Reproduce the Problem

...
export KARPENTER_VERSION=v0.19.1
> helm upgrade --install --namespace karpenter --create-namespace \
>   karpenter oci://public.ecr.aws/karpenter/karpenter \
>   --version ${KARPENTER_VERSION}\
>   --set serviceAccount.annotations."eks\.amazonaws\.com/role-arn"=${KARPENTER_IAM_ROLE_ARN} \
>   --set settings.aws.clusterName=${CLUSTER_NAME} \
>   --set settings.aws.clusterEndpoint=${CLUSTER_ENDPOINT} \
>   --set settings.aws.defaultInstanceProfile=KarpenterNodeInstanceProfile-${CLUSTER_NAME} \
>   --set settings.aws.interruptionQueueName=${CLUSTER_NAME} \
>   --set nodeSelector.intent=control-apps \
>   --wait
Release "karpenter" does not exist. Installing it now.
NAME: karpenter
LAST DEPLOYED: Mon Nov 21 18:49:51 2022
NAMESPACE: karpenter
STATUS: deployed
REVISION: 1
TEST SUITE: None

Resource Specs and Logs

See above for Actual Behavior

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave “+1” or “me too” comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 48
  • Comments: 30 (21 by maintainers)

Most upvoted comments

This is due to a known bug in the knative certificate reconciliation. We’re moving towards deprecating these webhooks in a future release. If it’s not blocking your operations, you can safely ignore them for now.

For folks concerned about this error, know that it’s just noise unless it happens continuously without going away.

Ideally, we’d prevent it from happening in the first place, but this requires changes upstream to knative/pkg.

This should be closed and fixed with v0.33.0, since the webhooks will be disabled by default.

Happens on upgrading karpenter from 0.16.3 to 0.20.0 as well. Is there any fix for the issue? 2022-12-09T11:37:32.796Z ERROR webhook.ConfigMapWebhook Reconcile error {"commit": "683d4b0", "knative.dev/traceid": "55e2e637-4c5e-4da8-87e2-3df075627951", "knative.dev/key": "validation.webhook.config.karpenter.sh", "duration": "55.431539ms", "error": "failed to update webhook: Operation cannot be fulfilled on validatingwebhookconfigurations.admissionregistration.k8s.io \"validation.webhook.config.karpenter.sh\": the object has been modified; please apply your changes to the latest version and try again"} 2022-12-09T11:37:32.803Z ERROR webhook.ValidationWebhook Reconcile error {"commit": "683d4b0", "knative.dev/traceid": "c0df4868-8646-46a2-9d8d-1a4e5f5c4944", "knative.dev/key": "validation.webhook.karpenter.k8s.aws", "duration": "62.174113ms", "error": "failed to update webhook: Operation cannot be fulfilled on validatingwebhookconfigurations.admissionregistration.k8s.io \"validation.webhook.karpenter.k8s.aws\": the object has been modified; please apply your changes to the latest version and try again"} 2022-12-09T11:37:32.816Z INFO controller.aws.pricing updated spot pricing with instance types and offerings {"commit": "683d4b0", "instance-type-count": 562, "offering-count": 1680} 2022-12-09T11:37:32.835Z ERROR webhook.ValidationWebhook Reconcile error {"commit": "683d4b0", "knative.dev/traceid": "77892349-fd32-4fb8-88f7-f37a32980014", "knative.dev/key": "validation.webhook.karpenter.sh", "duration": "94.138809ms", "error": "failed to update webhook: Operation cannot be fulfilled on validatingwebhookconfigurations.admissionregistration.k8s.io \"validation.webhook.karpenter.sh\": the object has been modified; please apply your changes to the latest version and try again"} 2022-12-09T11:37:32.835Z ERROR webhook.ValidationWebhook Reconcile error {"commit": "683d4b0", "knative.dev/traceid": "17b4d468-f8cf-4009-9fc0-e9ffd8bceb0a", "knative.dev/key": "karpenter/karpenter-cert", "duration": "91.610995ms", "error": "failed to update webhook: Operation cannot be fulfilled on validatingwebhookconfigurations.admissionregistration.k8s.io \"validation.webhook.karpenter.k8s.aws\": the object has been modified; please apply your changes to the latest version and try again"} 2022-12-09T11:37:32.835Z ERROR webhook.DefaultingWebhook Reconcile error {"commit": "683d4b0", "knative.dev/traceid": "190d534f-c21c-44ac-b241-bc2250ebc841", "knative.dev/key": "defaulting.webhook.karpenter.k8s.aws", "duration": "92.665285ms", "error": "failed to update webhook: Operation cannot be fulfilled on mutatingwebhookconfigurations.admissionregistration.k8s.io \"defaulting.webhook.karpenter.k8s.aws\": the object has been modified; please apply your changes to the latest version and try again"} 2022-12-09T11:37:32.836Z ERROR webhook.DefaultingWebhook Reconcile error {"commit": "683d4b0", "knative.dev/traceid": "8a823689-fd28-4942-baac-432d2eab067f", "knative.dev/key": "defaulting.webhook.karpenter.sh", "duration": "92.806907ms", "error": "failed to update webhook: Operation cannot be fulfilled on mutatingwebhookconfigurations.admissionregistration.k8s.io \"defaulting.webhook.karpenter.sh\": the object has been modified; please apply your changes to the latest version and try again"} 2022-12-09T11:37:32.851Z ERROR webhook.DefaultingWebhook Reconcile error {"commit": "683d4b0", "knative.dev/traceid": "3cf2997a-a103-4b91-924b-36e884779475", "knative.dev/key": "karpenter/karpenter-cert", "duration": "59.566759ms", "error": "failed to update webhook: Operation cannot be fulfilled on mutatingwebhookconfigurations.admissionregistration.k8s.io \"defaulting.webhook.karpenter.k8s.aws\": the object has been modified; please apply your changes to the latest version and try again"} 2022-12-09T11:37:41.824Z INFO controller.aws.pricing updated on-demand pricing {"commit": "683d4b0", "instance-type-count": 595} I1209 11:37:49.507135 1 leaderelection.go:258] successfully acquired lease karpenter/karpenter-leader-election

This is happening to me with version 0.30.0 in EKS, clean installs, multiple clusters having the same problem. I installed via the helm chart. Only thing I did that’s a little unusual is that the helm chart is installed via ArgoCD.

The problem’s been happening about a week, and after multiple restarts, so whatever’s supposed to be self-healing, isn’t in my case.

I saw this in one of my clusters too… it’s just a noise atm

2023-02-16T04:10:45.532Z	ERROR	webhook.DefaultingWebhook	Reconcile error	{"commit": "5a7faa0-dirty", "knative.dev/traceid": "681c03be-5f2c-4919-8169-82e6f0b5468d", "knative.dev/key": "defaulting.webhook.karpenter.sh", "duration": "81.929004ms", "error": "failed to update webhook: Operation cannot be fulfilled on mutatingwebhookconfigurations.admissionregistration.k8s.io \"defaulting.webhook.karpenter.sh\": the object has been modified; please apply your changes to the latest version and try again"}
2023-02-16T04:10:45.532Z	ERROR	webhook.DefaultingWebhook	Reconcile error	{"commit": "5a7faa0-dirty", "knative.dev/traceid": "ef42549f-439b-4cc4-be33-3bdb81a2ede6", "knative.dev/key": "karpenter/karpenter-cert", "duration": "81.772322ms", "error": "failed to update webhook: Operation cannot be fulfilled on mutatingwebhookconfigurations.admissionregistration.k8s.io \"defaulting.webhook.karpenter.k8s.aws\": the object has been modified; please apply your changes to the latest version and try again"}

I didn’t wait long enough. I restarted both pods and they went instantly OK

Happened with me on clean install of karpenter v0.20.0 that has been deployed using v4.18.1 of https://github.com/aws-ia/terraform-aws-eks-blueprints/releases/tag/v4.18.1 This repo has "examples/karpenter` that can be used to create a new eks cluster with karpenter.

ev/key": "karpenter/karpenter-cert", "duration": "117.122341ms", "error": "failed to update webhook: Operation cannot be fulfilled on mutatingwebhookconfigurations.admissionregistration.k8s.io \"defaulting.webhook.karpenter.sh\": the object has been modified; please apply your changes to the latest version and try again"}
2022-12-19T07:52:09.323Z	INFO	controller.provisioner	found provisionable pod(s)	{"commit": "f60dacd", "pods": 3}
2022-12-19T07:52:09.323Z	INFO	controller.provisioner	computed new node(s) to fit pod(s)	{"commit": "f60dacd", "nodes": 1, "pods": 3}
2022-12-19T07:52:09.326Z	INFO	controller.provisioner	launching node with 3 pods requesting {"cpu":"3205m","memory":"170Mi","pods":"8"} from types t3a.xlarge, m5a.xlarge, m6a.xlarge, m5ad.xlarge, t3.xlarge and 102 other(s)	{"commit": "f60dacd", "provisioner": "default-lt"}
2022-12-19T07:52:11.155Z	INFO	controller.provisioner.cloudprovider	launched new instance	{"commit": "f60dacd", "provisioner": "default-lt", "launched-instance": "i-0b42e194fd56487d3", "hostname": "ip-10-20-22-228.ap-south-1.compute.internal", "type": "t3a.xlarge", "zone": "ap-south-1b", "capacity-type": "on-demand"}
2022-12-19T07:53:34.643Z	INFO	controller.deprovisioning	deprovisioning via consolidation delete, terminating 1 nodes ip-10-20-22-228.ap-south-1.compute.internal/t3a.xlarge/on-demand	{"commit": "f60dacd"}
2022-12-19T07:53:34.693Z	INFO	controller.termination	cordoned node	{"commit": "f60dacd", "node": "ip-10-20-22-228.ap-south-1.compute.internal"}
2022-12-19T07:53:34.906Z	INFO	controller.termination	deleted node	{"commit": "f60dacd", "node": "ip-10-20-22-228.ap-south-1.compute.internal"}
2022-12-19T07:56:09.444Z	DEBUG	controller.deprovisioning	discovered EC2 instance types	{"commit": "f60dacd", "instance-type-count": 369}
2022-12-16T10:46:32.331Z	ERROR	webhook.DefaultingWebhook	Reconcile error	{"commit": "f60dacd", "knative.dev/traceid": "20595618-b5cf-44f2-ac69-7623f346f6a9", "knative.dev/key": "defaulting.webhook.karpenter.k8s.aws", "duration": "92.305087ms", "error": "failed to update webhook: Operation cannot be fulfilled on mutatingwebhookconfigurations.admissionregistration.k8s.io \"defaulting.webhook.karpenter.k8s.aws\": the object has been modified; please apply your changes to the latest version and try again"}
2022-12-16T10:46:32.397Z	ERROR	webhook.ConfigMapWebhook	Reconcile error	{"commit": "f60dacd", "knative.dev/traceid": "2c584730-2e50-4621-801d-721dbe31106b", "knative.dev/key": "validation.webhook.config.karpenter.sh", "duration": "184.799247ms", "error": "failed to update webhook: Operation cannot be fulfilled on validatingwebhookconfigurations.admissionregistration.k8s.io \"validation.webhook.config.karpenter.sh\": the object has been modified; please apply your changes to the latest version and try again"}
2022-12-16T10:46:32.397Z	ERROR	webhook.ValidationWebhook	Reconcile error	{"commit": "f60dacd", "knative.dev/traceid": "47a61cd2-8c8b-4a20-bb02-0e0407dad5de", "knative.dev/key": "karpenter/karpenter-cert", "duration": "74.579494ms", "error": "failed to update webhook: Operation cannot be fulfilled on validatingwebhookconfigurations.admissionregistration.k8s.io \"validation.webhook.karpenter.k8s.aws\": the object has been modified; please apply your changes to the latest version and try again"}
2022-12-16T10:46:32.398Z	ERROR	webhook.ValidationWebhook	Reconcile error	{"commit": "f60dacd", "knative.dev/traceid": "1c45f60b-736e-4a07-86ac-ef186a49f4de", "knative.dev/key": "karpenter/karpenter-cert", "duration": "70.071203ms", "error": "failed to update webhook: Operation cannot be fulfilled on validatingwebhookconfigurations.admissionregistration.k8s.io \"validation.webhook.karpenter.sh\": the object has been modified; please apply your changes to the latest version and try again"}
2022-12-16T10:46:32.410Z	ERROR	webhook.DefaultingWebhook	Reconcile error	{"commit": "f60dacd", "knative.dev/traceid": "34bda325-4907-428e-b47f-ad6e29e07f42", "knative.dev/key": "karpenter/karpenter-cert", "duration": "82.480505ms", "error": "failed to update webhook: Operation cannot be fulfilled on mutatingwebhookconfigurations.admissionregistration.k8s.io \"defaulting.webhook.karpenter.k8s.aws\": the object has been modified; please apply your changes to the latest version and try again"}
2022-12-16T10:46:33.006Z	INFO	controller.aws.pricing	updated on-demand pricing	{"commit": "f60dacd", "instance-type-count": 369}
2022-12-19T08:01:14.828Z	DEBUG	controller.deprovisioning	discovered EC2 instance types	{"commit": "f60dacd", "instance-type-count": 369}

These logs are from two pods. Note that errors are only in logs of one pod. Other pod’s logs don’t have these errors. Secondly, these errors were only seen on 2022-12-16 (When I created fresh cluster with karpenter). I am not noticing these errors now.

This comment is copied from my message at https://kubernetes.slack.com/archives/C02SFFZSA2K/p1671437472589809

Did the controller not work after these errors? They should just be transient errors that self heal, since both controllers are trying to reconcile the same webhook.

upgraded from v0.20 to v0.21 and enabled DriftEnabled. got this error

2022-12-28T13:02:14.804Z    DEBUG    controller    karpenter-global-settings config "karpenter-global-settings" config was added or updated: settings.Settings{BatchMaxDuration:v1.Duration{Duration:10000000000}, BatchIdleDuration:v1.Duration{Duration:5000000000}, DriftEnabled:true}    {"commit": "0c8536a-dirty"}                                                                                                                 │
2022-12-28T13:02:14.804Z    DEBUG    controller    karpenter-global-settings config "karpenter-global-settings" config was added or updated: settings.Settings{ClusterName:"fernando", ClusterEndpoint:"https://XXXX.gr7.us-east-1.eks.amazonaws.com", DefaultInstanceProfile:"", EnablePodENI:false, EnableENILimitedPodDensity:true, IsolatedVPC:false, NodeNameConvention:"resource-name", VMMemo │
ryOverheadPercent:0.075, InterruptionQueueName:"Karpenter-fernando", Tags:map[string]string{}}    {"commit": "0c8536a-dirty"}                                                                                                                                                                                                                                                                         │
2022-12-28T13:02:22.879Z    ERROR    webhook.DefaultingWebhook    Reconcile error    {"commit": "0c8536a-dirty", "knative.dev/traceid": "cf5dc332-ab4a-4da8-91c1-2dd25ac95fc3", "knative.dev/key": "defaulting.webhook.karpenter.sh", "duration": "20.341173ms", "error": "failed to update webhook: Operation cannot be fulfilled on mutatingwebhookconfigurations.admissionregistration.k8s.io \"defaulting.webhook.karpenter.sh\": th │
e object has been modified; please apply your changes to the latest version and try again"}

restarting both pods seems to fix it

Reopening this as the issue is still here and lies with knative.