runtime: Segmentation fault using P/Invoke on Raspberry Pi 3B / ARM32

Summary

I’m trying to use P/Invoke to control an LED strip attached to my Raspberry Pi GPIO pins. The source for the native C library I’m working with is here.

I’ve found that I get a segmentation fault when trying to use the native interop support that can be inconsistent. I noticed there are several other segmentation fault on ARM/Raspberry Pi open and I may be running into the same thing.

Environment

  • Raspberry Pi 3B
  • Raspbian 9
  • .NET Core SDK 2.2.103

Reproduction

I’ve created a minimal repro in this repository. It includes a built version of the native library. In that repository there is a logs folder that includes logs from running strace and gdb. The error, when seen, consistently reports in GDB as being from WKS::gc_heap::mark_object_simple(unsigned char**) () in libcoreclr.so. I’ve also got some notes I’m keeping in the readme in that repo as I troubleshoot.

I’ve not yet done a core dump to analyze the issue, however I have tried setting COMPlus_HeapVerify=1 like sudo COMPlus_HeapVerify=1 ./ConsoleDemo on the published application as noted in this issue and I didn’t see any additional output. I’m not sure if I was supposed to or if I wasn’t properly setting the value.

Something I did note which was very, very odd: While trying to debug the issue, I was adding some Console.WriteLine() calls to see how far the app would get before faulting. At one point I actually got it to run entirely without any faults just by adding enough Console.WriteLine() calls with no other changes. I commented out the calls, re-published, re-ran… and got the fault again. Put the Console.WriteLine() calls back… no fault. I left the Console.WriteLine() calls there, added some more operations to the lights (more P/Invoke ops) and got the fault again. I just thought it was really weird that I could consisently get past the fault just by adding Console.WriteLine(). Didn’t make any sense.

I intend on getting a core dump and trying to debug through it to see if I can get more information, but I thought I’d raise it here in case someone had something… maybe simpler I could try. Or perhaps this is already being solved in some other issue?

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 18 (17 by maintainers)

Most upvoted comments

YES YES YES YES YES You folks are my heroes. 🥇 🎉

I had to:

  • Switch the NativeMethods to use IntPtr as you mentioned; and
  • Add the second channel (the channel_2 structure) to match the setup in the main library (without this I was getting another segmentation fault)

But after that it totally worked. I quite literally ran around my living room pumping my fists in triumph.

Thank you ever so much for your help, it means the world.

As a first step, I can confirm I can repro the issue.

As for the array of the channels, I’ve looked at the definition of the RPI_PWM_CHANNELS and it is 2. So the easiest way to fix that is to add one more member, public ws2811_channel_t channel_2;. There is a way to create an array that’s inlined in a data structure (https://docs.microsoft.com/en-us/dotnet/csharp/programming-guide/unsafe-code-pointers/fixed-size-buffers), but it can be used only in structs and the element type can only be one of the basic numeric types (byte, short, int, …). It would be doable to transform your code that way, but it doesn’t seem worth all the added complexity. The issue “Gpio 2126597824 is illegal for LED channel 0” is caused by incorrect marshalling of the ws_2811_t class. I can see that the native code is getting the address of the managed object and not the address of the first field. Each reference object (class) starts with a pointer to so called MethodTable, which describes the type of the object and the actual fields are stored after that. So the native code was getting address of that. I am no expert on the interop, but what would definitely help is to change all the parameters of the methods in AddressableLed.Interop.NativeMethods from ref ws2811_t ws2811 to IntPtr ws2811 and then at all places where you call them pass inthis._ws2811Handle.AddrOfPinnedObject() instead of ref this._ws2811). I’ve tried to do that and your stuff now seems to run. I cannot verify that it does what it is supposed to do, but it doesn’t report any error and doesn’t seem to crash.

Using the COMPlus_HeapVerify=1, I can see that the heap got corrupted between two GCs, which may point to a possible issue with the native library. E.g. using a pointer to managed object that is not pinned etc. Unfortunately lldb doesn’t fully work on ARM32 and so I cannot use SOS plugin to dump various managed objects etc. This problem with lldb has been bothering me for a long time and so I am starting to look into it. I’ll continue debugging this issue in a week after I get back from my vacation.

I’m going to take a look.