datadog-agent: WorkloadMetaCollector panics in agent version 7.33.0

The agent version 7.33.0 panics sporadically with the following trace:

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x31bc00b]

goroutine 285 [running]:
github.com/DataDog/datadog-agent/pkg/tagger/collectors.(*WorkloadMetaCollector).processEvents(0xc00025ba70, 0xc000222780, 0xd, 0xd, 0xc0036c5680)
	/omnibus/src/datadog-agent/src/github.com/DataDog/datadog-agent/pkg/tagger/collectors/workloadmeta_extract.go:109 +0xcb
github.com/DataDog/datadog-agent/pkg/tagger/collectors.(*WorkloadMetaCollector).Stream(0xc00025ba70, 0xc, 0xc000d9abc8)
	/omnibus/src/datadog-agent/src/github.com/DataDog/datadog-agent/pkg/tagger/collectors/workloadmeta_main.go:103 +0x199
created by github.com/DataDog/datadog-agent/pkg/tagger/local.(*Tagger).registerCollectors
	/omnibus/src/datadog-agent/src/github.com/DataDog/datadog-agent/pkg/tagger/local/tagger.go:208 +0x1ec

About this issue

  • Original URL
  • State: open
  • Created 2 years ago
  • Reactions: 12
  • Comments: 18 (7 by maintainers)

Commits related to this issue

Most upvoted comments

@deadok22 follow up PRs changed it again, see #11200 where we actually found the root cause. It appears that this issue doesn’t stem from that, see below.

@therc I don’t see from the stack trace where a nil subscribers could be the issue, since it’s pointing to a meta := ev.Entity.GetID() that doesn’t involve subscribers at all.

Seems to be a missing continue in the docker collector on error: https://github.com/DataDog/datadog-agent/blob/main/pkg/workloadmeta/collectors/internal/docker/docker.go#L146-L148 – I’ll open a PR with a fix.

@deadok22 we’re definitely on 7.33.1.

2022-03-01 19:41:18 UTC | CORE | INFO | (cmd/agent/app/run.go:250 in StartAgent) | Starting Datadog Agent v7.33.1
# ls -l `which agent`
-rwxr-xr-x 1 root root 109069416 Feb 10 10:57 /opt/datadog-agent/bin/agent//agent

It’s a custom image with the nvml extra integration baked in… I wonder if that might be the problem. Or maybe it’s a related, but not identical issue? I’ll try with 7.34.0.