runtime: `dotnet build` intermittently crashes with segfault on Ubuntu 18.04
Now and then our build agent produces broken builds. The Error message reads:
##[error]Error: The process '/home/agent/agent/_work/_tool/dotnet/dotnet' failed with exit code null
The project is a dotnet core 3.1 web api solution with something like 30 projects, no unmanaged stuff at all.
root cause is a segfault as seen in dmesg
$ dmesg | grep dotnet
[17426.781072] dotnet[36429]: segfault at 18 ip 00007f9d65e87892 sp 00007f9d5e083bb0 error 4 in libpthread-2.27.so[7f9d65e7b000+1a000]
[1418646.055501] dotnet[36089]: segfault at 18 ip 00007f345cea9892 sp 00007f33b9703eb0 error 4 in libpthread-2.27.so[7f345ce9d000+1a000]
[2246615.917135] dotnet[87465]: segfault at 18 ip 00007fd998396382 sp 00007fd98fd373a0 error 4 in libpthread-2.27.so[7fd99838a000+1a000]
[2362725.938722] dotnet[21158]: segfault at 18 ip 00007fe8ee98a892 sp 00007fe8e637ee00 error 4 in libpthread-2.27.so[7fe8ee97e000+1a000]
[2432991.847286] dotnet[48481]: segfault at 18 ip 00007f7ac18e8892 sp 00007f7a46173b00 error 4 in libpthread-2.27.so[7f7ac18dc000+1a000]
[2704555.425939] dotnet[88757]: segfault at 18 ip 00007fe0bc6bb892 sp 00007fe0b48b4ae0 error 4 in libpthread-2.27.so[7fe0bc6af000+1a000]
[2846996.143322] dotnet[107654]: segfault at 18 ip 00007fad287ea892 sp 00007facad075b00 error 4 in libpthread-2.27.so[7fad287de000+1a000]
[2853616.129105] dotnet[15803]: segfault at 18 ip 00007f72657db892 sp 00007f725d1cfb00 error 4 in libpthread-2.27.so[7f72657cf000+1a000]
[3496394.984178] dotnet[59923]: segfault at 18 ip 00007f5d8ffe7892 sp 00007f5d889e1b00 error 4 in libpthread-2.27.so[7f5d8ffdb000+1a000]
[3630179.291391] dotnet[98248]: segfault at 18 ip 00007f8d8079a892 sp 00007f8d78993e00 error 4 in libpthread-2.27.so[7f8d8078e000+1a000]
[3633549.092183] dotnet[101217]: segfault at 18 ip 00007f617d49a892 sp 00007f60d9ce7e00 error 4 in libpthread-2.27.so[7f617d48e000+1a000]
Environment info:
NAME="Ubuntu"
VERSION="18.04.5 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.5 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic
Build agents are equipped with 2vCPU and 2GB memory.
dotnet --info
is not available, as there is no runtime nor SDK installed. We’re using the dotnet tool installer during build:
Tool to install: .NET Core sdk version 3.1.x.
Found version 3.1.405 in channel 3.1 for user specified version spec: 3.1.x
Version: 3.1.405 was found in cache.
Creating global tool path and pre-pending to PATH.
I have no idea how to debug this. I’d like to provide more info, but need assistance to do so.
About this issue
- Original URL
- State: open
- Created 3 years ago
- Comments: 44 (35 by maintainers)
The fix has been released in libssl package version
1.1.1-1ubuntu2.1~18.04.23+esm1
.Here are my repro steps to acquire that package: https://gist.github.com/richlander/47333cbf90ee0ee3f51bcb0dbbb3a76f?permalink_comment_id=4676592#gistcomment-4676592
We are in the late stages of getting Canonical to publish a fix in Ubuntu 18.04 via their ESM program. I believe the easiest way to access that is via Ubuntu Pro.
Ah got it. And later versions - 20.04 etc?
What I can see in the dump is that the main thread has already exited and the crashing secondary thread is attempting to run some OpenSSL code and a lock address inside of libcrypto passed to CRYPTO_THREAD_write_lock is set to NULL. This sounds like the same issue as https://github.com/dotnet/runtime/issues/34231. Only that this time, it doesn’t stem from the ERR_reason_error_string like in that issue, but from the following:
cc: @bartonjs
I’m experiencing this on a self-hosted Azure DevOps BuildAgent which fails randomly on dotnet commands on .net core 3.1 projects
dotnet --info
The build succeeds but since the process is returning with exit code null the build process fails.