inspektor-gadget: trace exec: `cannot create bpf perf link: permission denied` on RHEL 9.3

Description

Using the trace exec gadget fails on RHEL 9.3 with default kernel:

$ uname -a
Linux localhost.localdomain 5.14.0-362.18.1.el9_3.x86_64 #1 SMP PREEMPT_DYNAMIC Wed Jan 3 15:54:45 EST 2024 x86_64 x86_64 x86_64 GNU/Linux

Impact

Our customers cannot use our solution based on IG on RHEL 9 nodes.

Environment and steps to reproduce

  1. Set-up: download and install RHEL 9.3 server (https://developers.redhat.com/products/rhel/download - requires a FREE developer subscription).
  2. Task: try to run sudo ig trace exec
  3. Action(s): it is also possible to simply run exec.go a. clone IG repo b. compile trace/exec c. run with sudo
  4. Error:
[matthias@localhost ~]$ git clone https://github.com/inspektor-gadget/inspektor-gadget.git
Cloning into 'inspektor-gadget'...
remote: Enumerating objects: 40924, done.
remote: Counting objects: 100% (1875/1875), done.
remote: Compressing objects: 100% (939/939), done.
remote: Total 40924 (delta 1130), reused 1467 (delta 881), pack-reused 39049
Receiving objects: 100% (40924/40924), 83.34 MiB | 15.11 MiB/s, done.
Resolving deltas: 100% (25637/25637), done.
[matthias@localhost ~]$ cd inspektor-gadget/examples/gadgets/basic/trace/exec/
[matthias@localhost exec]$ ls
exec.go  README.md
[matthias@localhost exec]$ go build .
go: downloading github.com/cilium/ebpf v0.12.3
go: downloading golang.org/x/exp v0.0.0-20231108232855-2478ac86f678
go: downloading golang.org/x/text v0.14.0
go: downloading go.opentelemetry.io/otel v1.22.0
go: downloading golang.org/x/sys v0.16.0
go: downloading github.com/hashicorp/go-multierror v1.1.1
go: downloading github.com/sirupsen/logrus v1.9.3
go: downloading golang.org/x/term v0.16.0
go: downloading github.com/hashicorp/errwrap v1.1.0
go: downloading github.com/coreos/go-systemd/v22 v22.5.0
go: downloading github.com/godbus/dbus/v5 v5.1.0
go: downloading github.com/google/uuid v1.6.0
go: downloading github.com/spf13/cobra v1.8.0
go: downloading github.com/syndtr/gocapability v0.0.0-20200815063812-42c35b437635
go: downloading github.com/spf13/pflag v1.0.5
[matthias@localhost exec]$ sudo ./exec
[sudo] password for matthias: 
error creating tracer: attaching exit tracepoint: cannot create bpf perf link: permission denied

Expected behavior

Trace execs without an error.

Additional information

Slack thread: https://kubernetes.slack.com/archives/CSYL75LF6/p1706873942198409

About this issue

  • Original URL
  • State: closed
  • Created 5 months ago
  • Reactions: 1
  • Comments: 20 (18 by maintainers)

Commits related to this issue

Most upvoted comments

I found the root cause and opened a merge request on their gitlab: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/3717

@mauriciovasquezbernal @eiffel-fl your summaries are correct.

will it be backported to 9.3?

9.3, being an odd minor release has a short lifecycle and is reaching it’s EndOfLife soon, so unfortunately we are not likely to backport the fix there.

Nonetheless, all of this is related to linux-rt, and here the problem occurs on a “simple RHEL” (i.e. not rt). So, even if the used RHEL is a non rt one, you backported the first patch? Can you please shed some light on it? I would like to understand how it works underlyingly.

Starting with 9.3 Red Hat no longer maintains a separate tree for rt kernels, that means rt patches are applied to the main tree and both rt and non-rt kernels are built from the same sources only differing in configs.

Hi there,

I worked on this problem a while ago and to fix this properly some changes will be needed on inspektor-gadget’s side.

The upstream kernel fix can be found here: https://lore.kernel.org/lkml/20231005123413.GA488417@alecto.usersys.redhat.com/t/#u. This fix is included in rhel9.4, but inspektor-gadget would still fail even with newer kernels. When using syscall tracepoints in bpf programs what the program is really getting is struct syscall_trace_(enter|exit) which happened to have the same offsets for args member as struct trace_event_raw_sys_(enter|exit) so any of these structs could be used, but with struct trace_entry changed this is no longer the case.

I’ve looked at bpf programs inspektor-gadget uses and it seems like all of them use trace_event_raw_sys_* structs, which will need to be replaced by syscall_trace_* counterparts.

I’ve tested exec example mentioned in the report with the following patch on rhel9.4 kernels and it works as expected.

diff --git a/gadgets/trace_exec/program.bpf.c b/gadgets/trace_exec/program.bpf.c
index d7ea8bf7..8cda3a90 100644
--- a/gadgets/trace_exec/program.bpf.c
+++ b/gadgets/trace_exec/program.bpf.c
@@ -95,7 +95,7 @@ static __always_inline bool valid_uid(uid_t uid)
 }

 SEC("tracepoint/syscalls/sys_enter_execve")
-int ig_execve_e(struct trace_event_raw_sys_enter *ctx)
+int ig_execve_e(struct syscall_trace_enter *ctx)
 {
        u64 id;
        pid_t pid, tgid;
@@ -216,7 +216,7 @@ static __always_inline bool has_upper_layer()
 }

 SEC("tracepoint/syscalls/sys_exit_execve")
-int ig_execve_x(struct trace_event_raw_sys_exit *ctx)
+int ig_execve_x(struct syscall_trace_exit *ctx)
 {
        u64 id;
        pid_t pid, tgid;

I also think there is something off regarding the format of the tracepoint and the btf information:

$ sudo cat /sys/kernel/debug/tracing/events/syscalls/sys_exit_execve/format
name: sys_exit_execve
ID: 744
format:
	field:unsigned short common_type;	offset:0;	size:2;	signed:0;
	field:unsigned char common_flags;	offset:2;	size:1;	signed:0;
	field:unsigned char common_preempt_count;	offset:3;	size:1;	signed:0;
	field:int common_pid;	offset:4;	size:4;	signed:1;
	field:unsigned char common_preempt_lazy_count;	offset:8;	size:1;	signed:0;

	field:int __syscall_nr;	offset:12;	size:4;	signed:1;
	field:long ret;	offset:16;	size:8;	signed:1;

print fmt: "0x%lx", REC->ret
$ sudo bpftool btf dump id 1 | grep '\[1816\]' -A 6
[1816] STRUCT 'trace_entry' size=12 vlen=5
	'type' type_id=12 bits_offset=0
	'flags' type_id=8 bits_offset=16
	'preempt_count' type_id=8 bits_offset=24
	'pid' type_id=14 bits_offset=32
	'preempt_lazy_count' type_id=8 bits_offset=64
$ sudo bpftool btf dump id 1 | grep '\[54355\]' -A 4
[54355] STRUCT 'trace_event_raw_sys_exit' size=32 vlen=4
	'ent' type_id=1816 bits_offset=0
	'id' type_id=32 bits_offset=128
	'ret' type_id=32 bits_offset=192
	'__data' type_id=489 bits_offset=256

Up to the preempt_lazy_count field everything is fine, both sources indicate its offset is 8 (64 bits), but from the id / __syscall_nr the offset is wrong field:int __syscall_nr; offset:12; size:4; signed:1; (12) vs 'id' type_id=32 bits_offset=128 (128/8=16).

Because of this, bpftrace is failing too: (In this case, it reports wrong information instead of fail)

$ sudo bpftrace -e 'tracepoint:syscalls:sys_exit_execve { printf("ret is: %ld\n", args->ret); }'
Attaching 1 probe...
ret is: 59

59 is the syscall id in the architecture I’m running.

error creating tracer: attaching exit tracepoint: cannot create bpf perf link: permission denied

So attaching the enter tracepoint works fine but attaching the exit tracepoint fails.

Does the exit tracepoint actually exist on your kernel?

# ls -1d /sys/kernel/debug/tracing/events/syscalls/sys_*_execve
/sys/kernel/debug/tracing/events/syscalls/sys_enter_execve
/sys/kernel/debug/tracing/events/syscalls/sys_exit_execve

I wonder if attaching on the exit tracepoint of another syscall would work or if it is specific to execve.

The following built-in gadgets use a exit tracepoint. Do they work on the RHEL 9.3 kernel?

  • trace open
  • trace mount
  • trace signal