libbpfgo: "operation not permitted" errors for batch operations
I have been trying to debug the reason for this, but so far I haven’t managed to be successful. Thus, I’m asking for help.
We try to use GetValuesAndBatch
and it receives “operation not permitted” error.
I have tried to bump the rlimits
(usually that’s the culprit under this error) but no luck with that either
https://github.com/parca-dev/parca-agent/blob/bd9807a3a0e16302b5944d570967ef5a828dfc80/pkg/profiler/profiler.go#L623-L642
Do you happen to have any pointers or guideline for me to further debug this? Or could this be related to error handling?
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 17 (17 by maintainers)
We have one for running with vagrant, and a small sheet on installing in kubernetes. I’m not very experienced with k8s but my teammates are, so if you have any issues feel free to ask and i’ll tag the appropriate people!
@kakkoyun Hi, sorry about the delay, I will get back to you on this a little later today!
Hey @grantseltzer, I made it work in a degree. My previous mistake was to pass the capacity of the array as batch size/count.
The “somewhat” working version is below.
https://github.com/parca-dev/parca-agent/blob/d44bf3134624064580b269c81621b85857bdd7e4/pkg/profiler/profiler.go#L342-L381
The problem with it that I need to know the actual number of elements in the map before determining the maximum allow batch count. My first question is, is there a neater way to fetch the number of the elements in a BPF map?
The second and maybe more important question is concerning this: https://github.com/parca-dev/parca-agent/blob/d44bf3134624064580b269c81621b85857bdd7e4/pkg/profiler/profiler.go#L356
Is there reconciliation lag or implicit behavior between kernel and user-space regarding BPF maps? Without waiting between operations, it’d constantly give
EPERM
errors. I have discovered this as a result of pure coincidence. It was working when a debugger attached and a breakpoint exists before theGetValueAndDeleteBatch
.Do you have any idea? What’s happening here? What am I doing wrong?
I would have but no I have no tried running parca-agent yet. I do recommend tracee for debugging though! It’s easier to use for debugging than strace.
Yes that’s what I understand as well (I wrote that documentation btw haha)
Do you mean within libbpfgo? You may be right, i’m not sure if an error would surface if the ‘count’ output value isn’t checked. This should also be elaborated on in the GoDocs for these functions. I will create an issue for improving.
@grantseltzer I think we were passing the capacity of the map, and we need to pass the size of the map, (still tying to validate)
In any case, from the documentation I understand that even though AP returns error it could be a partial success. And the index of the last successful operation is indicated by the count in/out parameter. Is this also want you understand from it?
If it’s the case, the current batch APIs don’t consider this fact.
I’m seeing that if the
count
parameter passed tobpf_map_lookup_and_delete_batch
is greater than the number of elements in the map that I get EPERM.Checking out the documentation for the libbpf function, count is an input and output parameter, so you should be able to print that value to see if anything is being read/deleted before the permission denied error occurs.
Hm, i’ll investigate this now. I typically run with tracee in background to see what syscall is giving the EPERM and go from there.