datadog-agent: Cluster Agent cannot reconcile webhook

Output of the info page (if this is a bug)

2022-01-04 15:47:36 UTC | CORE | WARN | (pkg/util/log/log.go:630 in func1) | Deactivating Autoconfig will disable most components. It's recommended to use autoconfig_exclude_features and autoconfig_include_features to activate/deactivate features selectively
2022-01-04 15:47:36 UTC | CORE | INFO | (cmd/system-probe/config/config.go:119 in Merge) | no config exists at /etc/datadog-agent/system-probe.yaml, ignoring...
Getting the status from the agent.

===============
Agent (v7.32.3)
===============

  Status date: 2022-01-04 15:47:36.933 UTC (1641311256933)
  Agent start: 2022-01-04 15:46:59.953 UTC (1641311219953)
  Pid: 1
  Go Version: go1.16.7
  Python Version: 3.8.11
  Build arch: amd64
  Agent flavor: agent
  Check Runners: 4
  Log Level: INFO

  Paths
  =====
    Config File: /etc/datadog-agent/datadog.yaml
    conf.d: /etc/datadog-agent/conf.d
    checks.d: /etc/datadog-agent/checks.d

  Clocks
  ======
    NTP offset: -3.602ms
    System time: 2022-01-04 15:47:36.933 UTC (1641311256933)

  Host Info
  =========
    bootTime: 2022-01-03 09:07:50 UTC (1641200870000)
    kernelArch: x86_64
    kernelVersion: 5.4.0-1064-azure
    os: linux
    platform: ubuntu
    platformFamily: debian
    platformVersion: 21.04
    procs: 219
    uptime: 30h39m16s
    virtualizationRole: host
    virtualizationSystem: kvm

  Hostnames
  =========
    host_aliases: [6f2277ad-0ffe-4bcc-ad0a-497915c1b7ac aks-common-76155617-vmss000000-alg-m3-test-aks]
    hostname: aks-common-76155617-vmss000000-alg-m3-test-aks
    socket-fqdn: datadog-agent-5m7l5
    socket-hostname: datadog-agent-5m7l5
    host tags:
      cluster_name:alg-m3-test-aks
      kube_cluster_name:alg-m3-test-aks
    hostname provider: container
    unused hostname providers:
      aws: not retrieving hostname from AWS: the host is not an ECS instance and other providers already retrieve non-default hostnames
      azure: azure_hostname_style is set to 'os'
      configuration/environment: hostname is empty
      gce: unable to retrieve hostname from GCE: status code 404 trying to GET http://169.254.169.254/computeMetadata/v1/instance/hostname

  Metadata
  ========
    cloud_provider: Azure
    hostname_source: container

=========
Collector
=========

  Running Checks
  ==============
    
    containerd
    ----------
      Instance ID: containerd [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/containerd.d/conf.yaml.default
      Total Runs: 2
      Metric Samples: Last Run: 558, Total: 1,116
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 1, Total: 2
      Average Execution Time : 318ms
      Last Execution Date : 2022-01-04 15:47:21 UTC (1641311241000)
      Last Successful Execution Date : 2022-01-04 15:47:21 UTC (1641311241000)
      
    
    cpu
    ---
      Instance ID: cpu [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/cpu.d/conf.yaml.default
      Total Runs: 2
      Metric Samples: Last Run: 9, Total: 11
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 0s
      Last Execution Date : 2022-01-04 15:47:28 UTC (1641311248000)
      Last Successful Execution Date : 2022-01-04 15:47:28 UTC (1641311248000)
      
    
    cri
    ---
      Instance ID: cri [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/cri.d/conf.yaml.default
      Total Runs: 2
      Metric Samples: Last Run: 54, Total: 108
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 13ms
      Last Execution Date : 2022-01-04 15:47:35 UTC (1641311255000)
      Last Successful Execution Date : 2022-01-04 15:47:35 UTC (1641311255000)
      
    
    disk (4.4.0)
    ------------
      Instance ID: disk:e5dffb8bef24336f [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/disk.d/conf.yaml.default
      Total Runs: 2
      Metric Samples: Last Run: 712, Total: 1,424
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 56ms
      Last Execution Date : 2022-01-04 15:47:27 UTC (1641311247000)
      Last Successful Execution Date : 2022-01-04 15:47:27 UTC (1641311247000)
      
    
    file_handle
    -----------
      Instance ID: file_handle [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/file_handle.d/conf.yaml.default
      Total Runs: 2
      Metric Samples: Last Run: 5, Total: 10
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 0s
      Last Execution Date : 2022-01-04 15:47:34 UTC (1641311254000)
      Last Successful Execution Date : 2022-01-04 15:47:34 UTC (1641311254000)
      
    
    io
    --
      Instance ID: io [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/io.d/conf.yaml.default
      Total Runs: 2
      Metric Samples: Last Run: 78, Total: 102
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 0s
      Last Execution Date : 2022-01-04 15:47:26 UTC (1641311246000)
      Last Successful Execution Date : 2022-01-04 15:47:26 UTC (1641311246000)
      
    
    kubelet (7.1.0)
    ---------------
      Instance ID: kubelet:5bbc63f3938c02f4 [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/kubelet.d/conf.yaml.default
      Total Runs: 2
      Metric Samples: Last Run: 1,083, Total: 2,108
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 4, Total: 8
      Average Execution Time : 449ms
      Last Execution Date : 2022-01-04 15:47:27 UTC (1641311247000)
      Last Successful Execution Date : 2022-01-04 15:47:27 UTC (1641311247000)
      
    
    load
    ----
      Instance ID: load [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/load.d/conf.yaml.default
      Total Runs: 2
      Metric Samples: Last Run: 6, Total: 12
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 0s
      Last Execution Date : 2022-01-04 15:47:33 UTC (1641311253000)
      Last Successful Execution Date : 2022-01-04 15:47:33 UTC (1641311253000)
      
    
    memory
    ------
      Instance ID: memory [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/memory.d/conf.yaml.default
      Total Runs: 2
      Metric Samples: Last Run: 18, Total: 36
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 0s
      Last Execution Date : 2022-01-04 15:47:25 UTC (1641311245000)
      Last Successful Execution Date : 2022-01-04 15:47:25 UTC (1641311245000)
      
    
    network (2.4.0)
    ---------------
      Instance ID: network:d884b5186b651429 [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/network.d/conf.yaml.default
      Total Runs: 2
      Metric Samples: Last Run: 79, Total: 158
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 13ms
      Last Execution Date : 2022-01-04 15:47:32 UTC (1641311252000)
      Last Successful Execution Date : 2022-01-04 15:47:32 UTC (1641311252000)
      
    
    ntp
    ---
      Instance ID: ntp:d884b5186b651429 [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/ntp.d/conf.yaml.default
      Total Runs: 1
      Metric Samples: Last Run: 1, Total: 1
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 1, Total: 1
      Average Execution Time : 29ms
      Last Execution Date : 2022-01-04 15:47:06 UTC (1641311226000)
      Last Successful Execution Date : 2022-01-04 15:47:06 UTC (1641311226000)
      
    
    uptime
    ------
      Instance ID: uptime [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/uptime.d/conf.yaml.default
      Total Runs: 2
      Metric Samples: Last Run: 1, Total: 2
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 0s
      Last Execution Date : 2022-01-04 15:47:24 UTC (1641311244000)
      Last Successful Execution Date : 2022-01-04 15:47:24 UTC (1641311244000)
      
========
JMXFetch
========

  Information
  ==================
  Initialized checks
  ==================
    no checks
    
  Failed checks
  =============
    no checks
    
=========
Forwarder
=========

  Transactions
  ============
    Cluster: 0
    ClusterRole: 0
    ClusterRoleBinding: 0
    CronJob: 0
    DaemonSet: 0
    Deployment: 0
    Dropped: 0
    HighPriorityQueueFull: 0
    Job: 0
    Node: 0
    PersistentVolume: 0
    PersistentVolumeClaim: 0
    Pod: 0
    ReplicaSet: 0
    Requeued: 0
    Retried: 0
    RetryQueueSize: 0
    Role: 0
    RoleBinding: 0
    Service: 0
    ServiceAccount: 0
    StatefulSet: 0

  Transaction Successes
  =====================
    Total number: 6
    Successes By Endpoint:
      check_run_v1: 2
      intake: 2
      series_v1: 2

  API Keys status
  ===============
    API key ending with b9643: API Key valid

==========
Endpoints
==========
  https://app.datadoghq.com - API Key ending with:
      - b9643

==========
Logs Agent
==========

    Sending compressed logs in HTTPS to agent-http-intake.logs.datadoghq.com on port 443
    BytesSent: 1.416464e+06
    EncodedBytesSent: 43496
    LogsProcessed: 1333
    LogsSent: 1291

  datadog/datadog-agent-5m7l5/init-config
  ---------------------------------------
    - Type: file
      Identifier: d52221bc34ee20b008768595ae19895f09fe83c816fb54164db65dcb4eb616d1
      Path: /var/log/pods/datadog_datadog-agent-5m7l5_22ab947a-4f5a-45b8-a8a7-cc4b41c192cf/init-config/*.log
      Status: Pending
      BytesRead: 0
      Average Latency (ms): 0
      24h Average Latency (ms): 0
      Peak Latency (ms): 0
      24h Peak Latency (ms): 0

  container_collect_all
  ---------------------
    - Type: docker
      Status: Pending
      BytesRead: 0
      Average Latency (ms): 0
      24h Average Latency (ms): 0
      Peak Latency (ms): 0
      24h Peak Latency (ms): 0

  datadog/datadog-agent-5m7l5/process-agent
  -----------------------------------------
    - Type: file
      Identifier: 0dfeab4c2b292ce6d46e4c933819f7ee3b2022d6f19a74e36f9c053bed23ca15
      Path: /var/log/pods/datadog_datadog-agent-5m7l5_22ab947a-4f5a-45b8-a8a7-cc4b41c192cf/process-agent/*.log
      Status: Pending
      BytesRead: 0
      Average Latency (ms): 0
      24h Average Latency (ms): 0
      Peak Latency (ms): 0
      24h Peak Latency (ms): 0

  datadog/datadog-agent-5m7l5/agent
  ---------------------------------
    - Type: file
      Identifier: 9edc900f82e3ecc7c3f876ee3a8f76a30e3ffc51998d4f653f9a02c9a7956c75
      Path: /var/log/pods/datadog_datadog-agent-5m7l5_22ab947a-4f5a-45b8-a8a7-cc4b41c192cf/agent/*.log
      Status: Pending
      BytesRead: 0
      Average Latency (ms): 0
      24h Average Latency (ms): 0
      Peak Latency (ms): 0
      24h Peak Latency (ms): 0

  kube-system/local-nvme-provisioner-nmf42/provisioner
  ----------------------------------------------------
    - Type: file
      Identifier: 44a728ccbecbcf5d406f217608dafaa7e31401828302891084c7a74bcae5112d
      Path: /var/log/pods/kube-system_local-nvme-provisioner-nmf42_6b15fdc3-93f3-459c-ac35-4de83ea6439b/provisioner/*.log
      Status: OK
        1 files tailed out of 1 files matching
      Inputs:
        /var/log/pods/kube-system_local-nvme-provisioner-nmf42_6b15fdc3-93f3-459c-ac35-4de83ea6439b/provisioner/0.log
      BytesRead: 150625
      Average Latency (ms): 140
      24h Average Latency (ms): 140
      Peak Latency (ms): 527
      24h Peak Latency (ms): 527

  datadog/datadog-agent-5m7l5/init-volume
  ---------------------------------------
    - Type: file
      Identifier: 88ae3e02e78682babcf3e30529b9e02dfdd60d0e9d1614b504fb8a9c8d1b3c3a
      Path: /var/log/pods/datadog_datadog-agent-5m7l5_22ab947a-4f5a-45b8-a8a7-cc4b41c192cf/init-volume/*.log
      Status: Pending
      BytesRead: 0
      Average Latency (ms): 0
      24h Average Latency (ms): 0
      Peak Latency (ms): 0
      24h Peak Latency (ms): 0

  datadog/datadog-agent-5m7l5/trace-agent
  ---------------------------------------
    - Type: file
      Identifier: 24242ad0baa82636c471e84b78148d751d31a7804f6a22e2037d261d569b1e55
      Path: /var/log/pods/datadog_datadog-agent-5m7l5_22ab947a-4f5a-45b8-a8a7-cc4b41c192cf/trace-agent/*.log
      Status: Pending
      BytesRead: 0
      Average Latency (ms): 0
      24h Average Latency (ms): 0
      Peak Latency (ms): 0
      24h Peak Latency (ms): 0

=========
APM Agent
=========
  Status: Running
  Pid: 1
  Uptime: 36 seconds
  Mem alloc: 17,203,264 bytes
  Hostname: aks-common-76155617-vmss000000-***
  Receiver: 0.0.0.0:8126
  Endpoints:
    https://trace.agent.datadoghq.com

  Receiver (previous minute)
  ==========================
    From go 1.17.5 (gc-amd64-linux), client v1.34.0
      Traces received: 11 (4,852 bytes)
      Spans received: 11
      
    Default priority sampling rate: 100.0%
    Priority sampling rate for 'service:api,env:test': 100.0%
    Priority sampling rate for 'service:db,env:test': 100.0%

  Writer (previous minute)
  ========================
    Traces: 0 payloads, 0 traces, 0 events, 0 bytes
    Stats: 0 payloads, 0 stats buckets, 0 bytes

=========
Aggregator
=========
  Checks Metric Sample: 5,528
  Dogstatsd Metric Sample: 438
  Event: 1
  Events Flushed: 1
  Number Of Flushes: 2
  Series Flushed: 3,245
  Service Check: 35
  Service Checks Flushed: 32
=========
DogStatsD
=========
  Event Packets: 0
  Event Parse Errors: 0
  Metric Packets: 437
  Metric Parse Errors: 0
  Service Check Packets: 0
  Service Check Parse Errors: 0
  Udp Bytes: 73,515
  Udp Packet Reading Errors: 0
  Udp Packets: 275
  Uds Bytes: 0
  Uds Origin Detection Errors: 0
  Uds Packet Reading Errors: 0
  Uds Packets: 1
  Unterminated Metric Errors: 0

=====================
Datadog Cluster Agent
=====================

  - Datadog Cluster Agent endpoint detected: https://172.16.60.116:5005
  Successfully connected to the Datadog Cluster Agent.
  - Running: 1.16.0+commit.9961689

=============
Autodiscovery
=============
  Enabled Features
  ================
    containerd
    cri
    kubernetes

Describe what happened:

After upgrading the datadog helm chart to version 2.28.11 (datadog 7.32.3, DCA 1.16.0), we’re getting the following errors from the cluster agent: CLUSTER | INFO | (pkg/clusteragent/admission/controllers/webhook/controller_base.go:170 in processNextWorkItem) | Couldn't reconcile Webhook datadog-webhook: Operation cannot be fulfilled on mutatingwebhookconfigurations.admissionregistration.k8s.io "datadog-webhook": the object has been modified; please apply your changes to the latest version and try again.

Deleting and reinstalling the datadog helm chart does not fix the issue. Downgrading to version 2.22.10 (datadog 7.31.1, DCA 1.15.1) fixes the issue though.

Describe what you expected: We expect the cluster agent to work nominally and not throw errors about the admission controller webhook.

Steps to reproduce the issue: Deploy datadog via the helm chart in version 2.28.11 (or any 2.28 patch) with the following values:

datadog:
  kubelet:
    tlsVerify: false

  logs:
    enabled: true
    containerCollectAll: true

  apm:
    portEnabled: true

  env:
    - name: DD_CONTAINER_EXCLUDE_LOGS
      value: "image:mcr.microsoft.com/.*" # Exclude kube-proxy (mcr.microsoft.com/oss/kubernetes/kube-proxy)

  systemProbe:
    collectDNSStats: false


clusterAgent:
  admissionController:
    enabled: true
    mutateUnlabelled: true

Additional environment details (Operating System, Cloud provider, etc): Running on AKS with kubernetes v1.21.2

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 9
  • Comments: 31 (2 by maintainers)

Most upvoted comments

This was resolved here via the addition of an envar specific to AKS.

There are 2 potential workarounds:

  1. If you are not using Admission Controller functionality you can set clusterAgent.admissionController to false https://github.com/DataDog/helm-charts/blob/main/charts/datadog/values.yaml#L858

OR

  1. If you are using Admission Controller functionality, please set the envar DD_ADMISSION_CONTROLLER_ADD_AKS_SELECTORS to true in the clusterAgent section of your Helm chart.
clusterAgent:
  env:
    - name: "DD_ADMISSION_CONTROLLER_ADD_AKS_SELECTORS"
      value: "true"

Either of those two options will remove those errors

We will get this updated in the documentation.

Same for chart version v2.37.7

I am also facing the issue with Helm chart version 3.1.9 with Cluster Agent 7.39.2.

Part of the debug logs from the cluster-agent

2022-10-13 10:05:01 UTC | CLUSTER | DEBUG | (pkg/clusteragent/admission/controllers/webhook/controller_v1.go:141 in reconcile) | The Webhook datadog-webhook was found, updating it
2022-10-13 10:05:01 UTC | CLUSTER | DEBUG | (pkg/clusteragent/admission/controllers/webhook/controller_base.go:149 in enqueue) | Adding object with key datadog-webhook to the queue
2022-10-13 10:05:01 UTC | CLUSTER | DEBUG | (pkg/clusteragent/admission/controllers/webhook/controller_base.go:177 in processNextWorkItem) | Webhook datadog-webhook reconciled successfully
2022-10-13 10:05:01 UTC | CLUSTER | DEBUG | (pkg/clusteragent/admission/controllers/webhook/controller_v1.go:141 in reconcile) | The Webhook datadog-webhook was found, updating it
2022-10-13 10:05:01 UTC | CLUSTER | DEBUG | (pkg/clusteragent/admission/controllers/webhook/controller_base.go:149 in enqueue) | Adding object with key datadog-webhook to the queue
2022-10-13 10:05:01 UTC | CLUSTER | INFO | (pkg/clusteragent/admission/controllers/webhook/controller_base.go:171 in processNextWorkItem) | Couldn't reconcile Webhook datadog-webhook: Operation cannot be fulfilled on mutatingwebhookconfigurations.admissionregistration.k8s.io "datadog-webhook": the object has been modified; please apply your changes to the latest version and try again
2022-10-13 10:05:01 UTC | CLUSTER | DEBUG | (pkg/clusteragent/admission/controllers/webhook/controller_v1.go:141 in reconcile) | The Webhook datadog-webhook was found, updating it
2022-10-13 10:05:02 UTC | CLUSTER | DEBUG | (pkg/clusteragent/admission/controllers/webhook/controller_base.go:177 in processNextWorkItem) | Webhook datadog-webhook reconciled successfully
2022-10-13 10:05:02 UTC | CLUSTER | DEBUG | (pkg/clusteragent/admission/controllers/webhook/controller_v1.go:141 in reconcile) | The Webhook datadog-webhook was found, updating it
2022-10-13 10:05:02 UTC | CLUSTER | DEBUG | (pkg/clusteragent/admission/controllers/webhook/controller_base.go:149 in enqueue) | Adding object with key datadog-webhook to the queue
2022-10-13 10:05:02 UTC | CLUSTER | DEBUG | (pkg/clusteragent/admission/controllers/webhook/controller_base.go:149 in enqueue) | Adding object with key datadog-webhook to the queue
2022-10-13 10:05:02 UTC | CLUSTER | INFO | (pkg/clusteragent/admission/controllers/webhook/controller_base.go:171 in processNextWorkItem) | Couldn't reconcile Webhook datadog-webhook: Operation cannot be fulfilled on mutatingwebhookconfigurations.admissionregistration.k8s.io "datadog-webhook": the object has been modified; please apply your changes to the latest version and try again

I am experiencing the same with datadog helm chart - 2.30.17

This seems to be broken again, at least on AKS 1.25.11. Assuming you’ve set the env variable from the workaround above the agent creates the webhook with:

        "namespaceSelector": {
          "matchExpressions": [
            {
              "key": "control-plane",
              "operator": "DoesNotExist"
            }
          ]
        },

While the resource after Azure modifies it has

  namespaceSelector:
    matchExpressions:
    - key: control-plane
      operator: DoesNotExist
    - key: control-plane
      operator: NotIn
      values:
      - "true"
    - key: kubernetes.azure.com/managedby
      operator: NotIn
      values:
      - aks

The agent then just spins trying to reconcile the webhook several times a second forever.

I noticed I had two replicasets running. When I removed the one with zero replicas this error stopped. I am still seeing my helmrelease failing with the same error.

emilyzall@Emilys-MBP datadog-agent % kg rs -n datadog
NAME                                     DESIRED   CURRENT   READY   AGE
datadog-agent-cluster-agent-6976b556d6   12        12        12      22d
datadog-agent-cluster-agent-7fb8cbc449   0         0         0       3d2h
emilyzall@Emilys-MBP datadog-agent % kg deployment -n datadog
NAME                          READY   UP-TO-DATE   AVAILABLE   AGE
datadog-agent-cluster-agent   12/12   12           12          22d

@emily-zall’s solution is how I corrected my error in GKE Autopilot. I’m unsure how I ended up with 2 replicasets, but once I removed the one that had 0 desired, the error stopped for me.

I’m putting my exact error message so it can help others find the solution:

Couldn’t reconcile Secret default/webhook-certificate: secrets is forbidden: User “system:serviceaccount:datadog:datadog-cluster-agent” cannot create resource “secrets” in API group “” in the namespace “default”

UPDATE/EDIT: The reason I was seeing the old replica set was due to the fact that we use ArgoCD to deploy the agent. Based on the default revisionHistoryLimit value of the replica set, it was leaving the old ones in place (the default is 10). I set the value clusterAgent.revisionHistoryLimit to 0, which kept the replica set from saving the history on argocd changes.

Any update to this?

Same problem:

  • AKS kubernetes version: 1.23.8
  • Cluster-Agent: 7.39.0
  • Helm Chart: datadog-3.1.3

Edit: A word of warning. We have our AKS clusters configured to send Diagnostic Settings logs to Azure Monitor Log Analytics. In particular the logs for kube-audit or kube-audit-admin will pick up these DD errors, because they show up as a Update event in the cluster against the mutatingwebhookconfiguration resource. This was costing us a lot of money in Log Analytics, because these errors are very frequent. For now, we’ve had to disable the Cluster Agent’s Admission Controller feature. This stopped the excessive logging in the Cluster Agent pod as well as stopped the excessive update events being sent to Log Analytics.

Same problem here:

AKS kubernetes version:1.23.8 cluster-agent:7.40.1 helm chart:datadog-3.3.1

This issue popped up when running two instances of the Helm chart in one cluster(one with APM enabled, the other disabled) and inadvertently running two instances of the cluster agent(wrong indent in YAML used for disabling the cluster agent). Once the second cluster agent was disabled, the problem was resolved.

I am also facing the issue with Helm chart version 3.1.10 with Cluster Agent 7.39.2. Running on AKS, kubernetes version 1.24.6

Part of the debug logs from the cluster-agent:

2022-10-17 14:52:34 UTC | CLUSTER | INFO | (pkg/clusteragent/admission/controllers/webhook/controller_base.go:171 in processNextWorkItem) | Couldn’t reconcile Webhook datadog-webhook: Operation cannot be fulfilled on mutatingwebhookconfigurations.admissionregistration.k8s.io “datadog-webhook”: the object has been modified; please apply your changes to the latest version and try again 2022-10-17 14:52:34 UTC | CLUSTER | INFO | (pkg/clusteragent/admission/controllers/webhook/controller_base.go:171 in processNextWorkItem) | Couldn’t reconcile Webhook datadog-webhook: Operation cannot be fulfilled on mutatingwebhookconfigurations.admissionregistration.k8s.io “datadog-webhook”: the object has been modified; please apply your changes to the latest version and try again 2022-10-17 14:52:34 UTC | CLUSTER | INFO | (pkg/clusteragent/admission/controllers/webhook/controller_base.go:171 in processNextWorkItem) | Couldn’t reconcile Webhook datadog-webhook: Operation cannot be fulfilled on mutatingwebhookconfigurations.admissionregistration.k8s.io “datadog-webhook”: the object has been modified; please apply your changes to the latest version and try again 2022-10-17 14:52:35 UTC | CLUSTER | INFO | (pkg/clusteragent/admission/controllers/webhook/controller_base.go:171 in processNextWorkItem) | Couldn’t reconcile Webhook datadog-webhook: Operation cannot be fulfilled on mutatingwebhookconfigurations.admissionregistration.k8s.io “datadog-webhook”: the object has been modified; please apply your changes to the latest version and try again 2022-10-17 14:52:35 UTC | CLUSTER | INFO | (pkg/clusteragent/admission/controllers/webhook/controller_base.go:171 in processNextWorkItem) | Couldn’t reconcile Webhook datadog-webhook: Operation cannot be fulfilled on mutatingwebhookconfigurations.admissionregistration.k8s.io “datadog-webhook”: the object has been modified; please apply your changes to the latest version and try again 2022-10-17 14:52:36 UTC | CLUSTER | INFO | (pkg/clusteragent/admission/controllers/webhook/controller_base.go:171 in processNextWorkItem) | Couldn’t reconcile Webhook datadog-webhook: Operation cannot be fulfilled on mutatingwebhookconfigurations.admissionregistration.k8s.io “datadog-webhook”: the object has been modified; please apply your changes to the latest version and try again 2022-10-17 14:52:36 UTC | CLUSTER | INFO | (pkg/clusteragent/admission/controllers/webhook/controller_base.go:171 in processNextWorkItem) | Couldn’t reconcile Webhook datadog-webhook: Operation cannot be fulfilled on mutatingwebhookconfigurations.admissionregistration.k8s.io “datadog-webhook”: the object has been modified; please apply your changes to the latest version and try again

same issue in chart 3.1.8

Hi @clamoriniere, we still have the issue with 1.17.0 (used in version 2.35.0 of the datadog helm chart)