runtime: Segmentation fault on Linux using Release configuration started after upgrading to .NET Core 5
Description
I’m investigating an issue that started on a proprietary application after upgrading to .NET Core 5. The issue only occurs on Linux (linux-x64) when using the Release configuration; when running on Windows both Debug and Release work without issue and Debug seems to work without issue on Linux. I’ve traced the issue back to a method similar to the following:
public Result MyMethod(IItem item, Options options) {
// Options is a struct that is fairly complex, it has constants/fields/properties/methods.
if (item != null) {
// Doing anything with item here causes the Segmentation fault.
// However, the Segmentation fault does not happen every time this method is called. It is somewhat intermittent.
}
}
Strangely, when Options is changed from a struct to a class, the issue is resolved. The nature of this issue makes me think it is a .NET Core 5 bug (rather than a bug with the code) but I am not certain how to investigate this further. I’ve tried recreating the issue in a separate project but haven’t had any luck yet. If anyone has advice on how to get to the root cause, please let me know. Almost seems like there is some strange underlying memory issue (like options is somehow stomping on item) but I’m not familiar with debugging such issues for .NET.
Configuration
The Linux environment I am using to investigate this is Ubuntu 20.04 on WSL2 (note that the issue was originally seen on Docker containers in another environment). This is the output of dotnet --info on Ubunutu:
.NET SDK (reflecting any global.json):
Version: 5.0.200
Commit: 70b3e65d53
Runtime Environment:
OS Name: ubuntu
OS Version: 20.04
OS Platform: Linux
RID: ubuntu.20.04-x64
Base Path: /usr/share/dotnet/sdk/5.0.200/
Host (useful for support):
Version: 5.0.3
Commit: eae88cc11b
.NET SDKs installed:
5.0.200 [/usr/share/dotnet/sdk]
.NET runtimes installed:
Microsoft.AspNetCore.App 5.0.3 [/usr/share/dotnet/shared/Microsoft.AspNetCore.App]
Microsoft.NETCore.App 5.0.3 [/usr/share/dotnet/shared/Microsoft.NETCore.App]
To install additional .NET runtimes or SDKs:
https://aka.ms/dotnet-download
A Windows 10 (version 1909 build 18363.1440) machine is being used to run the dotnet publish command to create the output that is being tested on Linux. This is the dotnet --info output on Windows:
.NET SDK (reflecting any global.json):
Version: 5.0.103
Commit: 72dec52dbd
Runtime Environment:
OS Name: Windows
OS Version: 10.0.18363
OS Platform: Windows
RID: win10-x64
Base Path: C:\Program Files\dotnet\sdk\5.0.103\
Host (useful for support):
Version: 5.0.3
Commit: c636bbdc8a
.NET SDKs installed:
2.2.104 [C:\Program Files\dotnet\sdk]
3.1.300 [C:\Program Files\dotnet\sdk]
5.0.103 [C:\Program Files\dotnet\sdk]
.NET runtimes installed:
Microsoft.AspNetCore.All 2.1.25 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.All]
Microsoft.AspNetCore.All 2.2.2 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.All]
Microsoft.AspNetCore.App 2.1.25 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App]
Microsoft.AspNetCore.App 2.2.2 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App]
Microsoft.AspNetCore.App 3.1.4 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App]
Microsoft.AspNetCore.App 3.1.12 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App]
Microsoft.AspNetCore.App 5.0.3 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App]
Microsoft.NETCore.App 2.1.25 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App]
Microsoft.NETCore.App 2.2.2 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App]
Microsoft.NETCore.App 3.1.4 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App]
Microsoft.NETCore.App 3.1.12 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App]
Microsoft.NETCore.App 5.0.3 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App]
Microsoft.WindowsDesktop.App 3.1.4 [C:\Program Files\dotnet\shared\Microsoft.WindowsDesktop.App]
Microsoft.WindowsDesktop.App 3.1.12 [C:\Program Files\dotnet\shared\Microsoft.WindowsDesktop.App]
Microsoft.WindowsDesktop.App 5.0.3 [C:\Program Files\dotnet\shared\Microsoft.WindowsDesktop.App]
To install additional .NET runtimes or SDKs:
https://aka.ms/dotnet-download
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 23 (14 by maintainers)
@jeffrimko after more analysis I think this is indeed an instance of #49078, which will be fixed in 5.0.6, which should be out in a month or so.
I sent you an email about a possible workaround in the meantime.
@CallumDev I trimmed down the source code of
LibreLancer.Fx.FxBasicAppearance::Drawand created a simple repro in #49780 that hitsassert(!foundDiff)in LSRA that I believe reflects an issue you are seeing (although I can’t say this for sure since the checked JIT asserts earlier than where it was segfaulting with release runtime).I opened a separate issue #49780 and assigned it to @sandreenko since we believe it’s related to HFA handling on arm64 and Sergey has more expertise in the area.
@jeffrimko I also don’t think the issue you are seeing is the same as the one that @CallumDev reported since the latter is arm64 only and not intermittent.
@dotnet/jit-contrib Can someone take a look at the failure on linux-x64?
@CallumDev I was able to identify the method where the crash occurs
LibreLancer.Fx.FxBasicAppearance::Draw.With release runtime when the method is pmi-d it will crash with SIGSEGV with the same symptoms as you reported
With checked runtime it will abort earlier with an assertion at https://github.com/dotnet/runtime/blob/c636bbdc8a2d393d07c0e9407a4f8923ba1a21cb/src/coreclr/src/jit/lsra.cpp#L2262
The assertion is the same as in https://github.com/dotnet/runtime/issues/38772 but that one was fixed by https://github.com/dotnet/runtime/pull/39452.
I will take a look at the JIT dump and see what is going on.
@CallumDev Based on your call stack and the unique sequence of instructions where SIGSEGV was thrown I believe I was able to identify the corresponding location in the JIT source
inside
LinearScan::processBlockStartLocations(BasicBlock*)https://github.com/dotnet/runtime/blob/c636bbdc8a2d393d07c0e9407a4f8923ba1a21cb/src/coreclr/src/jit/lsra.cpp#L5087-L5093
Based on the symptoms seems that
interval->getNextRefPosition()returnsnullptr.Let me think how I can debug this further
cc @dotnet/jit-contrib