datadog-agent: Datadog Cluster Agent errors after release 7.35.0

Hi,

After the release of 2.33.4 version of the Datadog Helm Chart for Kubernetes we started to get this error on the logs:

Error from the agent http API server: http: TLS handshake error from 10.0.XXX.XXX:XXXX: EOF

This happened around 45.000 times in the space of one week. The call itself relates to port 5005 of the cluster agent, when the agent communicates with the cluster agent over TLS.

This is printed by the datadog-cluster-agent. However, after doing some tests of changing the image tag names for both the agent and the cluster agent, it seems that the problem occurs only on 7.35.+ version of the standard agent image.

Cluster Agent is on 1.19.0 and agent on 7.35.2.

It might be some sort of regression issue? Thanks.

About this issue

  • Original URL
  • State: open
  • Created 2 years ago
  • Reactions: 36
  • Comments: 17 (1 by maintainers)

Most upvoted comments

Seeing this as well, there is quick remediation steps?

I’m seeing this as well, is it causing your pods to crashloop?

Seeing this as well, between this and this issue https://github.com/DataDog/datadog-agent/pull/11126 (1.20 release is not out yet with the change) - the cluster agent is quite noisy at the moment šŸ˜„