tracee: [BUG] Clean up the manual cpuset mount in cgroupv1 environments

Prerequisites

  • This affects latest released version.
  • This affects current development tree (origin/HEAD).
  • There isn’t an issue describing the bug.

Select one OR another:

  • I’m going to create a PR to solve this (assign to yourself).
  • Someone else should solve this.

Bug description

Screen Shot 2022-07-07 at 16 11 16

  1. The cpuset diretory is mounted in the same directory as the tracee-ebpf binary.

  2. The cpuset directory is never unmounted and/or cleaned up.

Screen Shot 2022-07-07 at 16 15 31

  1. The syscall.Mount() call fails because of missing ./cpuset directory in some cases.

Screen Shot 2022-07-07 at 16 20 43

Steps to reproduce

See pictures above.

$ make -f builder/Makefile.tracee-make alpine-prepare

$ make -f builder/Makefile.tracee-make alpine-shell
...

tracee@54ab0cbb4872[/tracee]$ make all
...

$ sudo ./dist/tracee-ebpf -o none
containers: failed to mount cgroupv1 controller: failed to mount no such file or directory

Context

Relevant information about my setup:

  • Linux version: Ubuntu Jammy
  • Linux kernel version: 5.15.0-33-generic
  • Tracee version (or commit id of your tree): v0.8.0-rc-1-24-g07c8af7
  • LLVM version: 13
  • Golang version: 1.17

Additional Information (files, logs, etc)

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 27 (3 by maintainers)

Most upvoted comments

@rafaeldtinoco Let’s maybe reopen and move from milstone to track the mount cleanup need?

I’ve opened #1958 to handle this, but I want to open another issue describing why this is only a temporary solution, what is problematic about the ordering here?

Dropping the capabilities after calling New means that initBPF was already called, the hooks are already attached and events started to flow. So better do it before.

As we only introduce drop-capabilities in this release, I think that it will be ok to drop them after calling New(), and fixing it for the next release (moving it back before calling New()). WDYT @rafaeldtinoco @AlonZivony ?

@rafaeldtinoco I saw this change in your documentation PR, is it there by mistake?

I know that Arch based isn’t market target but registering that this is not reproducible in my host setup:

  • Linux version: Manjaro
  • Linux kernel version: 5.17.15-1-MANJARO

That’s because your’e running cgroup v2 and not v1 (BTW, I’m also a Manjaro user 😃 )

Not related to this bug, but this made me think that maybe we should also change the user of tracee to a non-root user (in addition to dropping capabilties), WDYT?

If ensureCapabilities is placed after tracee.New() it works like expected. But I’m not sure dropping capabilities after the initBPF() logic would be problematic. Yep, lets confirm with @AlonZivony. I’ll release it tomorrow anyway, I’ll continue fixing some minor documentation issues.

3. We identify the host as a container because /sys/fs/cgroup/cpuset/release_agent doesn’t exist although we assumed that it should (this turns out to be a wrong assumption)

Oh so this does work - I can see that you do have it on your host

I see the same problems =D

image

was playing with it

This is the result of dropping capabilities before the attempt of creating and mounting the cgroupfs cpuset directory (which is needed in the alpine environment created by the Makefile).

I see three problems here:

  1. The one you listed above
  2. We don’t check for error when calling os.Mkdir("./cpuset", 0644) (should return an error if err!=nil && err!=path exists)
  3. We identify the host as a container because /sys/fs/cgroup/cpuset/release_agent doesn’t exist although we assumed that it should (this turns out to be a wrong assumption)