bcc: usdt probes requiring semaphore cannot be used on google container OS
This isn’t necessarily a bug in bcc per se, more a write-up of what doesn’t work and why - hopefully it will at least help others from going down the rabbit hole I did.
There may be a way for bcc to fix this by finding another way to increment the semaphore, but I don’t really see how.
In the chromium kernel source code, here is code in fs/proc/base.c
that prevents processes from writing to their own memory maps for security reasons:
static ssize_t mem_write(struct file *file, const char __user *buf,
size_t count, loff_t *ppos)
{
#ifdef CONFIG_SECURITY_CHROMIUMOS_READONLY_PROC_SELF_MEM
return -EACCES;
#else
return mem_rw(file, (char __user*)buf, count, ppos, 1);
#endif
}
This variable is enabled by default in the kernel used by container OS, such as in google’s GKE offering: https://chromium.googlesource.com/chromiumos/overlays/board-overlays/+/master/overlay-lakitu/sys-kernel/lakitu-kernel-4_14/files/base.config#3016
This means that anyone trying to use bcc on a chromium derived OS, and especially anyone trying to use bcc on GKE, will probably also hit this if they try to use usdt probes.
During the process of enabling a usdt probe, some probes need to be enabled by writing to a semaphore - this isn’t true of all usdt probes, but is probably true of many (I ran into this with ruby’s usdt probes). As the dtrace docs indicate, this is a means to avoid expensive processing around the probe, only adding this extra info/processing if the probe is enabled.
The code that handles this in bcc is here: https://github.com/iovisor/bcc/blob/c2e2a26b8624492018a14d5eebd4a50b869c911f/src/cc/usdt/usdt.cc#L109-L113
And it is essentially the same as the approach described here.
However, this leads to probes silently failing to be enabled if run against a kernel with the above hardening. Using strace, it is obvious why it fails:
openat(AT_FDCWD, "/proc/726288/mem", O_RDWR) = 72
lseek(72, 94200854600568, SEEK_SET) = 94200854600568
read(72, "\0\0", 2) = 2
lseek(72, 94200854600568, SEEK_SET) = 94200854600568
write(72, "\1\0", 2) = -1 EACCES (Permission denied)
close(72) = 0
Note that this only will happen for probes where readelf --notes
indicates a value for the sempahore:
stapsdt 0x00000059 NT_STAPSDT (SystemTap probe descriptors)
Provider: ruby
Name: cmethod__entry
Location: 0x000000000019999d, Base: 0x00000000002d8ec0, Semaphore: 0x000000000052bb54
Arguments: 8@32(%rsp) 8@40(%rsp) 8@48(%rsp) -4@56(%rsp)
As there actually is a sempahore indicated here, this USDT probe would be affected. A similar probe in libc would not be affected and can be attached to as the semaphore is not required for the “enable” mechanism:
stapsdt 0x0000003c NT_STAPSDT (SystemTap probe descriptors)
Provider: libc
Name: memory_heap_free
Location: 0x000000000019bfd0, Base: 0x00000000001bdd48, Semaphore: 0x0000000000000000
Arguments: 8@%r11 8@%rax
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 37 (30 by maintainers)
Thank you for explaining @palmtenor I understand the approach now and have a rough idea of the scope of the work. This is definitely an exciting way forward, as I thought we had basically hit a brick wall in our environment under the current implementation.
When you have the time to create the BCC patch i would love to review it, mostly to improve my understanding.
@yonghong-song 0x55d is offset of the function to probe. 0x1034 is the offset for the USDT semaphore. The kernel will increase the semaphore when the uprobe is enabled.
Thanks @yonghong-song, my development host is kernel 5.3 and appears to have this field:
I will look into devising a-proof-of concept patch to add support for this method of semaphore incrementation to bcc. Our production kernel is still 4.14, but once we are able to upgrade to 4.20 or later, I’d like to be already have the functionality landed in bcc 😃