runtime: ReadyToRun images crash if compiled for AVX2 but run on non-AVX2 CPU

Update: Crash also happens if R2R image is built for SSE4.2 and run on CPU that only supports SSSE3 (I added a reply below)

Description

I’m using crossgen2 and .NET 6.0.0 GA with the following options to build a composite R2R image for my app (Paint.NET),

--targetos:windows --targetarch:x64 --optimize-time --composite --inputbubble --compilebubblegenerics --instruction-set:avx2,bmi,bmi2,lzcnt,popcnt,fma

From what I’m told, this image should still run fine on systems without support for these instruction sets, but the precompiled code will be ignored in favor of JITting. (or at least, any precompiled code which uses those instructions will be discarded – still haven’t heard conclusively if it’s selective exclusion or if the whole native image is ignored)

However, I received a report from a private tester that the app just crashes. Their CPU is a Pentium® Dual-Core CPU T4400 which supports up through SSSE3.

If I use bcdedit /set xsavedisable 1 on my own system, which is a Ryzen 5000 series CPU, I’m also able to reproduce the crash.

So it looks like ReadyToRun is doing something wrong here.

cc @tannergooding @EgorBo @AndyAyersMS

Reproduction Steps

Compile an app using crossgen2 and --targetos:windows --targetarch:x64 --optimize-time --composite --inputbubble --compilebubblegenerics --instruction-set:avx2,bmi,bmi2,lzcnt,popcnt,fma

Run the app on an older CPU that lacks AVX2, or run bcdedit /set xsavedisable 1 and reboot and run it on a CPU that is more recent.

Expected behavior

Everything works fine

Actual behavior

Crashes. Event Viewer has an entry for the app showing that it crashed.

Regression?

Not sure if this is a regression.

Known Workarounds

Compile with --instruction-set:sse2 instead.

Configuration

.NET 6.0.0 Windows 10/11 x64 CPU that lacks AVX2, or on one that does have AVX2 but first run bcdedit /set xsavedisable 1 and reboot

Other information

No response

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 4
  • Comments: 15 (15 by maintainers)

Commits related to this issue

Most upvoted comments

@rickbrew Correct, the problem is that when AVX is enabled for use, the JIT will actually use AVX instructions for quite a few operations. For instance, the AVX instruction set adds access to more slightly more efficient instructions for performing regular floating point math, as well as if a function has any more than a very small number of locals, AVX instructions are used to zero out the locals of the method. The end result, is that without a significant engineering effort we’re not able to produce a particularly useful mode in crossgen2 where only the appropriate subset of the functions would be disabled when AVX is not present at runtime. As such, the design was to disable codegen entirely from the module, as it would be much much simpler to implement. Unfortunately, even that is quite problematic to test, and we mishandled the composite mode case, leading to this bug.

In theory if you are truly seeing very significant wins from architecture specific compilation, it would be possible to build both dlls, and then write your own host, and choose exactly which set of compiled code to run at runtime, but that is so much work. I cannot recommend it. My recommendation would be to compile with the default instruction-set switch, and let the runtime default behavior of tiered compilation recompile as needed, or wait for us to fix this bug, compile assuming AVX2, produce an Arm64 build for customers running on Arm64 hardware, and accept the terrible startup performance on older X86 hardware. (Current Arm64 emulators do not support AVX, so you would likely want to avoid causing perf problems on newer Arm64 hardware). The realization of how complex all of this is, has led to us deprioritizing some of our efforts around higher order instruction set handling for CoreCLR in desktop application scenarios, in favor of exploring usage for the server container space, where developers can have extremely high confidence on what sort of hardware is in use, and that hardware pretty much all supports AVX.

@rickbrew, sorry for the delay on this. I’m currently about to take a closer look and make sure this works correctly. The current expectation is that the --instruction-set argument to crossgen2 should make an image where all of the compiled code will be dropped if the application is run on a machine which does not support the specified instruction set. The end result will be significantly degraded startup time on machines without the specific instruction sets. Due to engineering concerns in the current implementation of the JIT/crossgen2 compiler, we’re not currently able to enable AVX (or SSE4.2) support selectively on a subset of methods compiled into the application.