bcc: Userspace stack is not unwinded in most samples with offcputime.py
Hi,
I’m experimenting with tracing postgres using offcputime.py. However, userspace stack is not properly unwinded in most samples as shown in attached flamegraph [1], making the tool basically unusable. Postgres is compiled with -fno-omit-frame-pointer
(and this is confirmed a bunch of correctly unwinded stacks). I thought that one reason for this might be glibc compiled with -omit-frame-pointer
, because the last resolved symbol is epoll_pwait
from libc.so. I took musl libc and compiled it with -fno-omit-frame-pointer
, but the result is still the same. What can be other reasons for this and is there anything I can do about it?
Kernel version is 4.14.13.
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 22 (3 by maintainers)
Commits related to this issue
- avoid symbol demangling if the symbol is not a mangled symbol Fix issue #1641 The bcc user space stack is not printed out properly. From https://en.wikipedia.org/wiki/Name_mangling, all mangled sym... — committed to iovisor/bcc by yonghong-song 6 years ago
- permit symbol resulotion for function with size 0 The issue comes up when I investigated issue #1641. A func symbol defined in assembly code will be size of 0, e.g., http://git.musl-libc.org/cgit/mus... — committed to iovisor/bcc by yonghong-song 6 years ago
- avoid symbol demangling if the symbol is not a mangled symbol Fix issue #1641 The bcc user space stack is not printed out properly. From https://en.wikipedia.org/wiki/Name_mangling, all mangled sym... — committed to banh-gao/bcc by yonghong-song 6 years ago
- permit symbol resulotion for function with size 0 The issue comes up when I investigated issue #1641. A func symbol defined in assembly code will be size of 0, e.g., http://git.musl-libc.org/cgit/mus... — committed to banh-gao/bcc by yonghong-song 6 years ago
@arssher , through some debugging. I am able to root cause the issue. The file is
arch/x86/entry/entry_64.S
. A link to latest stable 4.13 code https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/tree/arch/x86/entry/entry_64.S?h=linux-4.13.y#n177 which is very similar to 4.14.13 code.By studying the code, I found one workaround, enable kernel tracepoint
raw_syscalls:sys_enter
orraw_syscalls:sys_exit
.This will add TIF_SYSCALL_TRACEPOINT to every task in the system and effectively force slowpath for syscall processing in
entry_64.S
. I did not find an easy workaround than this yet. This does have overhead as it will flood the debugfs trace_pipe.The issue has been fixed in upcoming 4.16 release and backported to stable release 4.14 and 4.15. So if you upgrade your kernel version from 4.14.13 to some later 4.14.x version, the problem will get fixed.
At the same time, I am studying whether we have easy way to fix the older kernels (4.13 or 4.9), or we could request the same patch (as in 4.14/4.15 stable release) back ported to 4.13/4.9.
The issue @palmtenor pointed to is actually a real issue since in
offcputime.py
, pid and user stack id is recorded in the same key.For example, initially,
<pid=100, user_stack_id=10>
is recorded, and later onuser_stack_id=10
is reused and<pid=200, user_stack_id=10>
is recorded. Later on,<pid=100, user_stack_id=10>
will be incorrect. One way COULD be to prevent the second<pid=200, user_stack_id=10>
by not allowing reuse. and warn user to increase the stack size to avoid collision…Can you try the following patch:
Basically, disable
BPF_F_REUSE_STACKID
inoffcputime.py
?