bcc: usdt probes requiring semaphore cannot be used on google container OS

This isn’t necessarily a bug in bcc per se, more a write-up of what doesn’t work and why - hopefully it will at least help others from going down the rabbit hole I did.

There may be a way for bcc to fix this by finding another way to increment the semaphore, but I don’t really see how.

In the chromium kernel source code, here is code in fs/proc/base.c that prevents processes from writing to their own memory maps for security reasons:

static ssize_t mem_write(struct file *file, const char __user *buf,
       size_t count, loff_t *ppos)
{
#ifdef CONFIG_SECURITY_CHROMIUMOS_READONLY_PROC_SELF_MEM
  return -EACCES;
#else
  return mem_rw(file, (char __user*)buf, count, ppos, 1);
#endif
}

This variable is enabled by default in the kernel used by container OS, such as in google’s GKE offering: https://chromium.googlesource.com/chromiumos/overlays/board-overlays/+/master/overlay-lakitu/sys-kernel/lakitu-kernel-4_14/files/base.config#3016

This means that anyone trying to use bcc on a chromium derived OS, and especially anyone trying to use bcc on GKE, will probably also hit this if they try to use usdt probes.

During the process of enabling a usdt probe, some probes need to be enabled by writing to a semaphore - this isn’t true of all usdt probes, but is probably true of many (I ran into this with ruby’s usdt probes). As the dtrace docs indicate, this is a means to avoid expensive processing around the probe, only adding this extra info/processing if the probe is enabled.

The code that handles this in bcc is here: https://github.com/iovisor/bcc/blob/c2e2a26b8624492018a14d5eebd4a50b869c911f/src/cc/usdt/usdt.cc#L109-L113

And it is essentially the same as the approach described here.

However, this leads to probes silently failing to be enabled if run against a kernel with the above hardening. Using strace, it is obvious why it fails:

openat(AT_FDCWD, "/proc/726288/mem", O_RDWR) = 72
lseek(72, 94200854600568, SEEK_SET)     = 94200854600568
read(72, "\0\0", 2)                     = 2
lseek(72, 94200854600568, SEEK_SET)     = 94200854600568
write(72, "\1\0", 2)                    = -1 EACCES (Permission denied)
close(72)                               = 0

Note that this only will happen for probes where readelf --notes indicates a value for the sempahore:

  stapsdt              0x00000059       NT_STAPSDT (SystemTap probe descriptors)
    Provider: ruby
    Name: cmethod__entry
    Location: 0x000000000019999d, Base: 0x00000000002d8ec0, Semaphore: 0x000000000052bb54
    Arguments: 8@32(%rsp) 8@40(%rsp) 8@48(%rsp) -4@56(%rsp)

As there actually is a sempahore indicated here, this USDT probe would be affected. A similar probe in libc would not be affected and can be attached to as the semaphore is not required for the “enable” mechanism:

  stapsdt              0x0000003c       NT_STAPSDT (SystemTap probe descriptors)
    Provider: libc
    Name: memory_heap_free
    Location: 0x000000000019bfd0, Base: 0x00000000001bdd48, Semaphore: 0x0000000000000000
    Arguments: 8@%r11 8@%rax

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 37 (30 by maintainers)

Most upvoted comments

Thank you for explaining @palmtenor I understand the approach now and have a rough idea of the scope of the work. This is definitely an exciting way forward, as I thought we had basically hit a brick wall in our environment under the current implementation.

When you have the time to create the BCC patch i would love to review it, mostly to improve my understanding.

@yonghong-song 0x55d is offset of the function to probe. 0x1034 is the offset for the USDT semaphore. The kernel will increase the semaphore when the uprobe is enabled.

Thanks @yonghong-song, my development host is kernel 5.3 and appears to have this field:

$ cat /sys/bus/event_source/devices/uprobe/format/ref_ctr_offset
config:32-63

I will look into devising a-proof-of concept patch to add support for this method of semaphore incrementation to bcc. Our production kernel is still 4.14, but once we are able to upgrade to 4.20 or later, I’d like to be already have the functionality landed in bcc 😃