datadog-agent: system-probe version 7.31.1 fails to start

Output of the info page (if this is a bug)

It doesn't start, so no output.

Describe what happened: system-probe version 7.31.1 crashes on start:

2021-10-15 12:59:45 UTC | SYS-PROBE | INFO | (pkg/util/log/log.go:596 in func1) | runtime: final GOMAXPROCS value is: 8
2021-10-15 12:59:45 UTC | SYS-PROBE | WARN | (pkg/util/log/log.go:611 in func1) | Unknown key in config file: runtime_security_config.debug
2021-10-15 12:59:45 UTC | SYS-PROBE | INFO | (pkg/util/log/log.go:596 in func1) | Features detected from environment: kubernetes
2021-10-15 12:59:45 UTC | SYS-PROBE | INFO | (pkg/util/log/log.go:591 in func1) | network_config.enabled detected: enabling system-probe with network module running.
2021-10-15 12:59:45 UTC | SYS-PROBE | INFO | (pkg/util/log/log.go:591 in func1) | system_probe_config.enable_oom_kill detected, will enable system-probe with OOM Kill check
2021-10-15 12:59:45 UTC | SYS-PROBE | INFO | (cmd/system-probe/app/run.go:143 in StartSystemProbe) | running system-probe with version: Agent 7.31.1 - Commit: 52b035e2f - Serialization version: v4.80.0 - Go version: go1.15.13
2021-10-15 12:59:45 UTC | SYS-PROBE | INFO | (pkg/network/tracer/utils_linux.go:38 in IsTracerSupportedByOS) | running on platform: ubuntu
2021-10-15 12:59:45 UTC | SYS-PROBE | INFO | (cmd/system-probe/modules/network_tracer.go:43 in func1) | Creating tracer for: system-probe
2021-10-15 12:59:45 UTC | SYS-PROBE | INFO | (pkg/network/tracer/tracer.go:115 in NewTracer) | detected platform 3.10.0, switch to use kprobes from kernel version < 4.1.0
2021-10-15 12:59:50 UTC | SYS-PROBE | ERROR | (cmd/system-probe/api/module/loader.go:51 in Register) | new module `network_tracer` error: error guessing offsets: could not load bpf module for offset guessing: couldn't load eBPF programs: map connectsock_ipv6: map create: permission denied
2021-10-15 12:59:50 UTC | SYS-PROBE | INFO | (cmd/system-probe/api/module/loader.go:42 in Register) | tcp_queue_length_tracer module disabled
sh: 1: modprobe: not found
chdir(/lib/modules/3.10.0-1160.31.1.el7.x86_64/build): No such file or directory
2021-10-15 12:59:50 UTC | SYS-PROBE | INFO | (cmd/system-probe/modules/oom_kill_probe.go:21 in func3) | Starting the OOM Kill probe
2021-10-15 12:59:50 UTC | SYS-PROBE | ERROR | (cmd/system-probe/api/module/loader.go:51 in Register) | new module `oom_kill_probe` error: unable to start the OOM kill probe: failed to compile “oom-kill-kern.c”
2021-10-15 12:59:50 UTC | SYS-PROBE | INFO | (cmd/system-probe/api/module/loader.go:42 in Register) | security_runtime module disabled
2021-10-15 12:59:50 UTC | SYS-PROBE | INFO | (cmd/system-probe/api/module/loader.go:42 in Register) | process module disabled
2021-10-15 12:59:50 UTC | SYS-PROBE | CRITICAL | (cmd/system-probe/app/run.go:188 in StartSystemProbe) | Error while starting api server, exiting: failed to create system probe: no module could be loaded
Error: Error while starting api server, exiting: failed to create system probe: no module could be loaded

It appears that 2 distinct problems are present on this:

  1. When I only enabled OOMKill the first erroneous outputs I saw were these, which appear to lead to failed to compile “oom-kill-kern.c”
sh: 1: modprobe: not found
chdir(/lib/modules/3.10.0-1160.31.1.el7.x86_64/build): No such file or directory
  1. When I only enabled networkMonitoring, I have this:
`network_tracer` error: error guessing offsets: could not load bpf module for offset guessing: couldn't load eBPF programs: map connectsock_ipv6: map create: permission denied

My values.yaml looks like this:

targetSystem: "linux"
datadog:
  # apiKey: <DATADOG_API_KEY> under /secrets
  clusterName: ionos
  # datadog.kubelet.tlsVerify should be `false` to establish communication with the kubelet
  kubelet:
    tlsVerify: "false"
  logs: ...
  apm: ...
  processAgent: ...
  systemProbe:
    bpfDebug: true
    enableTCPQueueLength: false
    enableOOMKill: true
    collectDNSStats: false
  networkMonitoring:
    enabled: true
agents:
  containers:
    agent:
      resources:
        limits:
          cpu: 200m
          memory: 256Mi
        requests:
          cpu: 200m
          memory: 256Mi
  tolerations:
    # These tolerations are needed to run the agent on master nodes
    - effect: NoSchedule
      key: node-role.kubernetes.io/controlplane
      operator: Exists
    - effect: NoExecute
      key: node-role.kubernetes.io/etcd
      operator: Exists

~I also noticed that the probe seem to identify the wrong distro, but I don’t know if that matters. My host OS is Centos 7.~ I’ve been looking at the above problem further. The ubuntu identified OS refers to the container.

Describe what you expected: system-probe should start.

Steps to reproduce the issue:

Additional environment details (Operating System, Cloud provider, etc): We deploy the agent as a daemon-set via the Helm chart to a Kubernetes cluster on IONOS. Chart version: datadog-2.22.16

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 5
  • Comments: 24 (10 by maintainers)

Most upvoted comments

@vigohe SELinux configuration is sadly not standard. We would need to know what host OS, distro, and version you are using. It may be faster to open a support case.

@mountainaireman @rogersd 7.32.1 is released now which fixes your problem.