tracee: Initial BPFMaps population freezes tracee-ebpf

While developing a fix for https://github.com/aquasecurity/tracee/issues/862, and loading the embedded CO-RE eBPF object, I realized that sometimes the logic worked and sometimes it did not.

From time to time the entire tracee-ebpf code was in a state where it could not receive (or display) any event:

BTF enabled, attempting to unpack CORE bpf object
unpacked CO:RE bpf object file into memory
TIME             UTS_NAME         CONTAINER_ID     UID    COMM             PID/host        TID/host        RET              EVENT                ARGS
<nothing>

and nothing happened.

The logic being added to PopulateMap is:

	// Initialize pid_to_cont_id_map if tracing containers
	c := Containers{}
	err := c.Populate()
	if err != nil {
		return err
	}
	bpfPidToContIdMap, _ := t.bpfModule.GetMap("pid_to_cont_id_map")
	for _, contId := range c.GetContainers() {
		for _, pidstr := range c.GetPids(contId) {
			if t.config.Debug {
				fmt.Println("Running container =", contId, "pid =", pidstr)
			}
			var pid uint32
			_, err = fmt.Sscanf(pidstr, "%d", &pid)
			err = bpfPidToContIdMap.Update(pid, []byte(contId))
			if err != nil {
				return err
			}
		}
	}

Initially I thought it was related to my logic, but debug always showed me that the slices of container_id and pids were ok:

$ sudo ./dist/tracee-ebpf --debug --trace container --trace event=execve
BTF enabled, attempting to unpack CORE bpf object
unpacked CO:RE bpf object file into memory
Running container = 0a829b3bc00d7f1b393d070c0f3e5d1929a186df1b41b4cd2bf95525f495aa55 pid = 1721925
Running container = 0a829b3bc00d7f1b393d070c0f3e5d1929a186df1b41b4cd2bf95525f495aa55 pid = 1726069
...

and there were no errors adding the pids to the map:

 err = bpfPidToContIdMap.Update(pid, []byte(contId))

I’m also able to reproduce this behavior using the versioned eBPF object file:

$ sudo TRACEE_BPF_FILE="$(pwd)/dist/tracee.bpf.5_11_0-24-generic.v0_6_0-11-g49503a2.o" ./dist/tracee-ebpf --debug --trace container --trace event=execve
BPF object file specified by TRACEE_BPF_FILE found: /home/rafaeldtinoco/work/sources/ebpf/aquasec-tracee/tracee-ebpf/dist/tracee.bpf.5_11_0-24-generic.v0_6_0-11-g49503a2.oRunning container = 0a829b3bc00d7f1b393d070c0f3e5d1929a186df1b41b4cd2bf95525f495aa55 pid = 1721925
Running container = 0a829b3bc00d7f1b393d070c0f3e5d1929a186df1b41b4cd2bf95525f495aa55 pid = 1726069
Running container = 9521c39d6d3d54f5c5f4760c2e3dbde4fdcfd4fba99c2d46c49a4edd63864ae3 pid = 1236570
Running container = 9521c39d6d3d54f5c5f4760c2e3dbde4fdcfd4fba99c2d46c49a4edd63864ae3 pid = 1236611
...
TIME             UTS_NAME         CONTAINER_ID     UID    COMM             PID/host        TID/host        RET              
EVENT                ARGS
<nothing>

I’m running Ubuntu Hirsute: 5.11.0-24-generic #25-Ubuntu with BTF enabled.

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 18

Commits related to this issue

Most upvoted comments

Do these freezes only happen with your changes applied or did you see it without them as well?

only with my changes afaict, specially procfs walk I suppose…

OK, so we better wait with the merge of #864 until figuring this out

Also, here is an interesting thing to note:

MAP: embedde.kconfig [{ “value”: { “.kconfig”: [{ “CONFIG_ARCH_HAS_SYSCALL_WRAPPER”: true } ] } } ]

Which means that libbpf populates a kconfig map when BTF is enabled. This might be related to what we are investigating in #851

When you reproduce this freeze, I suggest that you do the following:

  1. Dump the running bpf programs (using bpftool) and ensure they were loaded correctly (You can create a reference with a working run to compare against)
  2. Dump the contents of the maps which are responsible for filtering (in should_trace()), and check that they are populated as expected. Eventually, there should be pids in traced_pids_map (which are added after process execution) - so I would start with checking that this map is not empty.