libbpfgo: "operation not permitted" errors for batch operations

I have been trying to debug the reason for this, but so far I haven’t managed to be successful. Thus, I’m asking for help.

We try to use GetValuesAndBatch and it receives “operation not permitted” error.

https://github.com/parca-dev/parca-agent/blob/bd9807a3a0e16302b5944d570967ef5a828dfc80/pkg/profiler/profiler.go#L344-L358

I have tried to bump the rlimits (usually that’s the culprit under this error) but no luck with that either https://github.com/parca-dev/parca-agent/blob/bd9807a3a0e16302b5944d570967ef5a828dfc80/pkg/profiler/profiler.go#L623-L642

Do you happen to have any pointers or guideline for me to further debug this? Or could this be related to error handling?

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 17 (17 by maintainers)

Most upvoted comments

We have one for running with vagrant, and a small sheet on installing in kubernetes. I’m not very experienced with k8s but my teammates are, so if you have any issues feel free to ask and i’ll tag the appropriate people!

@kakkoyun Hi, sorry about the delay, I will get back to you on this a little later today!

Hey @grantseltzer, I made it work in a degree. My previous mistake was to pass the capacity of the array as batch size/count.

The “somewhat” working version is below.

https://github.com/parca-dev/parca-agent/blob/d44bf3134624064580b269c81621b85857bdd7e4/pkg/profiler/profiler.go#L342-L381

The problem with it that I need to know the actual number of elements in the map before determining the maximum allow batch count. My first question is, is there a neater way to fetch the number of the elements in a BPF map?

The second and maybe more important question is concerning this: https://github.com/parca-dev/parca-agent/blob/d44bf3134624064580b269c81621b85857bdd7e4/pkg/profiler/profiler.go#L356

Is there reconciliation lag or implicit behavior between kernel and user-space regarding BPF maps? Without waiting between operations, it’d constantly give EPERM errors. I have discovered this as a result of pure coincidence. It was working when a debugger attached and a breakpoint exists before the GetValueAndDeleteBatch.

Do you have any idea? What’s happening here? What am I doing wrong?

I assume you debugged it using tracee, right? I want to add that to our debugging flow if it’s case 😃

I would have but no I have no tried running parca-agent yet. I do recommend tracee for debugging though! It’s easier to use for debugging than strace.

In any case, from the documentation I understand that even though AP returns error it could be a partial success. And the index of the last successful operation is indicated by the count in/out parameter. Is this also want you understand from it?

Yes that’s what I understand as well (I wrote that documentation btw haha)

If it’s the case, the current batch APIs don’t consider this fact.

Do you mean within libbpfgo? You may be right, i’m not sure if an error would surface if the ‘count’ output value isn’t checked. This should also be elaborated on in the GoDocs for these functions. I will create an issue for improving.

@grantseltzer I think we were passing the capacity of the map, and we need to pass the size of the map, (still tying to validate)

In any case, from the documentation I understand that even though AP returns error it could be a partial success. And the index of the last successful operation is indicated by the count in/out parameter. Is this also want you understand from it?

If it’s the case, the current batch APIs don’t consider this fact.

I’m seeing that if the count parameter passed to bpf_map_lookup_and_delete_batch is greater than the number of elements in the map that I get EPERM.

Checking out the documentation for the libbpf function, count is an input and output parameter, so you should be able to print that value to see if anything is being read/deleted before the permission denied error occurs.

Hm, i’ll investigate this now. I typically run with tracee in background to see what syscall is giving the EPERM and go from there.