datadog-agent: WorkloadMetaCollector panics in agent version 7.33.0
The agent version 7.33.0 panics sporadically with the following trace:
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x31bc00b]
goroutine 285 [running]:
github.com/DataDog/datadog-agent/pkg/tagger/collectors.(*WorkloadMetaCollector).processEvents(0xc00025ba70, 0xc000222780, 0xd, 0xd, 0xc0036c5680)
/omnibus/src/datadog-agent/src/github.com/DataDog/datadog-agent/pkg/tagger/collectors/workloadmeta_extract.go:109 +0xcb
github.com/DataDog/datadog-agent/pkg/tagger/collectors.(*WorkloadMetaCollector).Stream(0xc00025ba70, 0xc, 0xc000d9abc8)
/omnibus/src/datadog-agent/src/github.com/DataDog/datadog-agent/pkg/tagger/collectors/workloadmeta_main.go:103 +0x199
created by github.com/DataDog/datadog-agent/pkg/tagger/local.(*Tagger).registerCollectors
/omnibus/src/datadog-agent/src/github.com/DataDog/datadog-agent/pkg/tagger/local/tagger.go:208 +0x1ec
About this issue
- Original URL
- State: open
- Created 2 years ago
- Reactions: 12
- Comments: 18 (7 by maintainers)
Commits related to this issue
- [workloadmeta] Fix race condition when reading from the store When notifying subscribers after handling an event, we read from the store without having a lock on `s.storeMut`. This may result in inco... — committed to DataDog/datadog-agent by juliogreff 2 years ago
- [workloadmeta] Fix race condition when reading from the store When notifying subscribers after handling an event, we read from the store without having a lock on `s.storeMut`. This may result in inco... — committed to DataDog/datadog-agent by juliogreff 2 years ago
- [workloadmeta] Fix race condition when reading from the store When notifying subscribers after handling an event, we read from the store without having a lock on `s.storeMut`. This may result in inco... — committed to DataDog/datadog-agent by juliogreff 2 years ago
- [workloadmeta] Fix race condition when reading from the store When notifying subscribers after handling an event, we read from the store without having a lock on `s.storeMut`. This may result in inco... — committed to DataDog/datadog-agent by juliogreff 2 years ago
- [workloadmeta] Fix race condition when reading from the store When notifying subscribers after handling an event, we read from the store without having a lock on `s.storeMut`. This may result in inco... — committed to DataDog/datadog-agent by juliogreff 2 years ago
- [docker] Don't generate events for un-inspectable containers See https://github.com/DataDog/datadog-agent/issues/10716#issuecomment-1126298463 for a customer facing this issue in 7.35.1: ``` 2022-05... — committed to DataDog/datadog-agent by juliogreff 2 years ago
- [docker] Don't generate events for un-inspectable containers See https://github.com/DataDog/datadog-agent/issues/10716#issuecomment-1126298463 for a customer facing this issue in 7.35.1: ``` 2022-05... — committed to DataDog/datadog-agent by juliogreff 2 years ago
@deadok22 follow up PRs changed it again, see #11200 where we actually found the root cause. It appears that this issue doesn’t stem from that, see below.
@therc I don’t see from the stack trace where a nil
subscriberscould be the issue, since it’s pointing to ameta := ev.Entity.GetID()that doesn’t involve subscribers at all.Seems to be a missing
continuein the docker collector on error: https://github.com/DataDog/datadog-agent/blob/main/pkg/workloadmeta/collectors/internal/docker/docker.go#L146-L148 – I’ll open a PR with a fix.@deadok22 we’re definitely on 7.33.1.
It’s a custom image with the nvml extra integration baked in… I wonder if that might be the problem. Or maybe it’s a related, but not identical issue? I’ll try with 7.34.0.