datadog-agent: system-probe version 7.31.1 fails to start
Output of the info page (if this is a bug)
It doesn't start, so no output.
Describe what happened:
system-probe version 7.31.1 crashes on start:
2021-10-15 12:59:45 UTC | SYS-PROBE | INFO | (pkg/util/log/log.go:596 in func1) | runtime: final GOMAXPROCS value is: 8
2021-10-15 12:59:45 UTC | SYS-PROBE | WARN | (pkg/util/log/log.go:611 in func1) | Unknown key in config file: runtime_security_config.debug
2021-10-15 12:59:45 UTC | SYS-PROBE | INFO | (pkg/util/log/log.go:596 in func1) | Features detected from environment: kubernetes
2021-10-15 12:59:45 UTC | SYS-PROBE | INFO | (pkg/util/log/log.go:591 in func1) | network_config.enabled detected: enabling system-probe with network module running.
2021-10-15 12:59:45 UTC | SYS-PROBE | INFO | (pkg/util/log/log.go:591 in func1) | system_probe_config.enable_oom_kill detected, will enable system-probe with OOM Kill check
2021-10-15 12:59:45 UTC | SYS-PROBE | INFO | (cmd/system-probe/app/run.go:143 in StartSystemProbe) | running system-probe with version: Agent 7.31.1 - Commit: 52b035e2f - Serialization version: v4.80.0 - Go version: go1.15.13
2021-10-15 12:59:45 UTC | SYS-PROBE | INFO | (pkg/network/tracer/utils_linux.go:38 in IsTracerSupportedByOS) | running on platform: ubuntu
2021-10-15 12:59:45 UTC | SYS-PROBE | INFO | (cmd/system-probe/modules/network_tracer.go:43 in func1) | Creating tracer for: system-probe
2021-10-15 12:59:45 UTC | SYS-PROBE | INFO | (pkg/network/tracer/tracer.go:115 in NewTracer) | detected platform 3.10.0, switch to use kprobes from kernel version < 4.1.0
2021-10-15 12:59:50 UTC | SYS-PROBE | ERROR | (cmd/system-probe/api/module/loader.go:51 in Register) | new module `network_tracer` error: error guessing offsets: could not load bpf module for offset guessing: couldn't load eBPF programs: map connectsock_ipv6: map create: permission denied
2021-10-15 12:59:50 UTC | SYS-PROBE | INFO | (cmd/system-probe/api/module/loader.go:42 in Register) | tcp_queue_length_tracer module disabled
sh: 1: modprobe: not found
chdir(/lib/modules/3.10.0-1160.31.1.el7.x86_64/build): No such file or directory
2021-10-15 12:59:50 UTC | SYS-PROBE | INFO | (cmd/system-probe/modules/oom_kill_probe.go:21 in func3) | Starting the OOM Kill probe
2021-10-15 12:59:50 UTC | SYS-PROBE | ERROR | (cmd/system-probe/api/module/loader.go:51 in Register) | new module `oom_kill_probe` error: unable to start the OOM kill probe: failed to compile “oom-kill-kern.c”
2021-10-15 12:59:50 UTC | SYS-PROBE | INFO | (cmd/system-probe/api/module/loader.go:42 in Register) | security_runtime module disabled
2021-10-15 12:59:50 UTC | SYS-PROBE | INFO | (cmd/system-probe/api/module/loader.go:42 in Register) | process module disabled
2021-10-15 12:59:50 UTC | SYS-PROBE | CRITICAL | (cmd/system-probe/app/run.go:188 in StartSystemProbe) | Error while starting api server, exiting: failed to create system probe: no module could be loaded
Error: Error while starting api server, exiting: failed to create system probe: no module could be loaded
It appears that 2 distinct problems are present on this:
- When I only enabled OOMKill the first erroneous outputs I saw were these, which appear to lead to
failed to compile “oom-kill-kern.c”
sh: 1: modprobe: not found
chdir(/lib/modules/3.10.0-1160.31.1.el7.x86_64/build): No such file or directory
- When I only enabled networkMonitoring, I have this:
`network_tracer` error: error guessing offsets: could not load bpf module for offset guessing: couldn't load eBPF programs: map connectsock_ipv6: map create: permission denied
My values.yaml looks like this:
targetSystem: "linux"
datadog:
# apiKey: <DATADOG_API_KEY> under /secrets
clusterName: ionos
# datadog.kubelet.tlsVerify should be `false` to establish communication with the kubelet
kubelet:
tlsVerify: "false"
logs: ...
apm: ...
processAgent: ...
systemProbe:
bpfDebug: true
enableTCPQueueLength: false
enableOOMKill: true
collectDNSStats: false
networkMonitoring:
enabled: true
agents:
containers:
agent:
resources:
limits:
cpu: 200m
memory: 256Mi
requests:
cpu: 200m
memory: 256Mi
tolerations:
# These tolerations are needed to run the agent on master nodes
- effect: NoSchedule
key: node-role.kubernetes.io/controlplane
operator: Exists
- effect: NoExecute
key: node-role.kubernetes.io/etcd
operator: Exists
~I also noticed that the probe seem to identify the wrong distro, but I don’t know if that matters. My host OS is Centos 7.~
I’ve been looking at the above problem further. The ubuntu identified OS refers to the container.
Describe what you expected:
system-probe should start.
Steps to reproduce the issue:
Additional environment details (Operating System, Cloud provider, etc):
We deploy the agent as a daemon-set via the Helm chart to a Kubernetes cluster on IONOS.
Chart version: datadog-2.22.16
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Reactions: 5
- Comments: 24 (10 by maintainers)
@vigohe SELinux configuration is sadly not standard. We would need to know what host OS, distro, and version you are using. It may be faster to open a support case.
@mountainaireman @rogersd
7.32.1is released now which fixes your problem.