runtime: Regression: Bus error when running PublishSingleFile=true .NET 6.0 app on linux-arm (Raspbian)

Description

Hello,

In original issue https://github.com/JustArchiNET/ArchiSteamFarm/issues/2457 I’m dealing with a regression that caused single-file publised app crash during initialization with Bus error (so to the best of my knowledge kernel sending SIGBUS to the process).

This issue did not happen with .NET 5.0 runtime, therefore I classify it as a regression.

<username>@<hostname>:~/ArchiSteamFarm $ ./ArchiSteamFarm
Bus error

Reproduction Steps

It’s very hard for me to give reproduction steps as I’m unable to reproduce this myself. The issue is specific to one user (albeit he claims that he has tried at least 2 different machines and got the same result).

The minimal repro I have right now is cloning my project git clone https://github.com/JustArchiNET/ArchiSteamFarm.git and checking out 876c3324526d0fe6b0a801210b63f663a4eb816c commit. The minimal build instructions I managed to pull it with was:

dotnet publish ArchiSteamFarm -c Release -o out -r linux-arm /p:PublishSingleFile=true --self-contained

Precompiled build is also available for download: https://github.com/JustArchiNET/ArchiSteamFarm/releases/download/5.2.0.9/ASF-linux-arm.zip

Expected behavior

The app works as previously, initializes properly and executes code.

Actual behavior

The app crashes with Bus error (so to the best of my knowledge kernel sending SIGBUS to the process). This happens before initialization of my app takes place (first line logged to the console), so it’s likely something related to decompression in-memory process of the single-file app.

I’ve asked the user to record COREHOST_TRACE=1, this was the output it recorded before crashing:

Tracing enabled @ Thu Dec  2 10:52:21 2021 GMT
--- Invoked apphost [version: static, commit hash: static] main = {
./ArchiSteamFarm
}
The managed DLL bound to this executable is: 'ArchiSteamFarm.dll'
Detected Single-File app bundle
Using internal fxr
Invoking fx resolver [/home/pi/ArchiS2/] hostfxr_main_bundle_startupinfo
Host path: [/home/pi/ArchiS2/ArchiSteamFarm]
Dotnet path: [/home/pi/ArchiS2/]
App path: [/home/pi/ArchiS2/ArchiSteamFarm.dll]
Bundle Header Offset: [18f0a400]
--- Invoked hostfxr_main_bundle_startupinfo [commit hash: static]
Mapped application bundle
Unmapped application bundle
Single-File bundle details:
DepsJson Offset:[1d938] Size[61fa2b8]
RuntimeConfigJson Offset:[2b0] Size[75b0f0]
.net core 3 compatibility mode: [No]
--- Executing in a native executable mode...
Using dotnet root path [/home/pi/ArchiS2/]
App runtimeconfig.json from [/home/pi/ArchiS2/ArchiSteamFarm.dll]
Runtime config is cfg=/home/pi/ArchiS2/ArchiSteamFarm.runtimeconfig.json dev=/home/pi/ArchiS2/ArchiSteamFarm.runtimeconfig.dev.json
Attempting to read runtime config: /home/pi/ArchiS2/ArchiSteamFarm.runtimeconfig.json
Attempting to read dev runtime config: /home/pi/ArchiS2/ArchiSteamFarm.runtimeconfig.dev.json
Mapped bundle for [/home/pi/ArchiS2/ArchiSteamFarm.runtimeconfig.json]
Unmapped application bundle
Runtime config [/home/pi/ArchiS2/ArchiSteamFarm.runtimeconfig.json] is valid=[1]
Executing as a self-contained app as per config file [/home/pi/ArchiS2/ArchiSteamFarm.runtimeconfig.json]
Using internal hostpolicy
Reading from host interface version: [0x16041101:124] to initialize policy version: [0x16041101:124]
Mapped application bundle

Sadly not very informative to me.

Regression?

Yes, single-file publish of this particular app worked fine in .NET 5.0. Single-file publish also works fine in .NET 6.0 for all other OS targets (e.g. linux-arm64, linux-x64, win-x64), it’s also not reproducible even in all linux-arm setups, I didn’t receive such error from other users, and we’ve tried to reproduce it ourselves.

According to the user this happens on 2 different machines (albeit similar), this decreases the chance of some kind of hardware malfunction or similar.

Known Workarounds

I’d be very happy if you could suggest any. I’m trying various things that come to my mind in original issue at https://github.com/JustArchiNET/ArchiSteamFarm/issues/2457 and the only thing that actually made it work (at least for now) was PublishSingleFile=false.

Right now I’m testing with the user if IncludeNativeLibrariesForSelfExtract=true or IncludeAllContentForSelfExtract=true helps with this issue.

Is there any way to force through environment variable old-style method of self-extraction single-file published app? The one that doesn’t involve switches during compilation, if that worked it’d be decent enough workaround for me to suggest for users dealing with this issue in our linux-arm builds while this issue is investigated.

Configuration

Host machine: Raspberry Pi linux-arm (raspbian.10-arm, kernel 5.10.63-v8+)

Last working (tested) runtime: .NET 5.0.11. First not-working (tested) runtime: .NET 6.0.0

The issue is specific to that configuration, I could not reproduce this on my linux-arm64 Raspberry Pi 4 machine.

Other information

Please let me know what else I can provide/do to help narrow this one down. I had no luck reproducing it myself on any of my machines, but I strongly believe this is a regression in regards to .NET 5.0. Perhaps one of you will be able to reproduce this problem by running my app on Raspberry Pi (Raspbian) linux-arm OS and therefore gather more info required to fix the problem.

I’m trying to actively work with the user to provide more info in regards to this, you can find our conversation here: https://github.com/JustArchiNET/ArchiSteamFarm/issues/2457

Thank you in advance for your interest in regards to this issue.

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 1
  • Comments: 22 (22 by maintainers)

Most upvoted comments

@JustArchi the workaround for the issue until it is fixed is to execute the following on the affected devices:

echo 2 | sudo tee /proc/cpu/alignment

This makes the kernel handle the unaligned accesses and make apps work fine (only a tiny bit slower due to the trap to kernel on each unaligned access ).

Looking at the dump on my RPI4, it is really a misaligned access:

Program terminated with signal SIGBUS, Bus error.
#0  0x009340a8 in ?? ()
(gdb) bt
#0  0x009340a8 in ?? ()
#1  0x0093409c in ?? ()

(gdb) disassemble 0x009340a8,+4
Dump of assembler code from 0x9340a8 to 0x9340ac:
=> 0x009340a8:  ldrd    r11, r0, [r0]
End of assembler dump.

(gdb) p/x $r0
$1 = 0xf75dae5b

ldrd instruction requires addresses aligned to 8 bytes.

Unaligned access handling can be set on Linux as described in https://mjmwired.net/kernel/Documentation/arm/mem_alignment. There are three options - the kernel handles it, but prints a warning message, the kernel handles it silently or the kernel generates SIGBUS.

I was able to easily repro the crash after issuing this command on my RPI4 (without any docker container)

echo 4 | sudo tee /proc/cpu/alignment

Since this is specific to a particular environment, it could be hard to reproduce. I wonder what additional diagnostics we can get from the user.

Perhaps a core dump could be helpful, if available?