runtime: Possible JIT optimization bug on .NET 5 preview
Description
On a managed .NET 5 C# project, I get intermittent errors, sometimes an AccessViolationException, other times ExecutionEngineException, and even NullReferenceException (?) with no managed code in sight on the debugger.
The code does mathematical optimization and is fully deterministic, however the exceptions are not always consistent even though they are frequent.
On the following scenarios these exceptions have never appeared so far (I tried a few times):
- Debug mode.
- Release mode with Optimize=False,
- Release mode with Optimize=True and COMPlus_JITMinOpts=1.
Unfortunately, I don’t know how to create a small program to reproduce the problem, but there seemed to be a particular point where the code breaks the most:
For context, _complement
is declared as:
private readonly ImmutableArray<ulong> _complement
An extra detail: in the constructor I initialize a regular ulong[]
and use Unsafe.As<ulong[], ImmutableArray<ulong>>
to set _complement
.
Here is a misterious NullReferenceException (on a value type?!)
JIT and compiler optimizations have a lot of impact on the performance in this project so I don’t think disabling these optimizations is a long term solution.
Configuration
.NET 5.0.0-preview.6.20305.6
Also tried preview 4 before and was getting similar exceptions.
category:correctness theme:testing skill-level:expert cost:large
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 19 (14 by maintainers)
Commits related to this issue
- JIT: ensure fgFirstBB has appropriate flags This fixes an issue where release jits might sometimes generate bad GC info. Keeping it minimal for now so we can consider servicing preview 8. See #39023... — committed to AndyAyersMS/runtime by AndyAyersMS 4 years ago
- JIT: ensure fgFirstBB has appropriate flags This fixes an issue where release jits might sometimes generate bad GC info. Keeping it minimal for now so we can consider servicing preview 8. See #39023... — committed to AndyAyersMS/runtime by AndyAyersMS 4 years ago
- JIT: ensure fgFirstBB has appropriate flags (#40038) This fixes an issue where release jits might sometimes generate bad GC info. Keeping it minimal for now so we can consider servicing preview 8. ... — committed to dotnet/runtime by AndyAyersMS 4 years ago
- Port JIT fix to Preview 8: ensure fgFirstBB has appropriate flags Port of #40038 to Preview 8. Fix #39023 Release jits might sometimes generate bad GC info. Mysterious intermittent crashes. Withou... — committed to AndyAyersMS/runtime by AndyAyersMS 4 years ago
- Port JIT fix to Preview 8: ensure fgFirstBB has appropriate flags (#40086) Port of #40038 to Preview 8. Fix #39023 Release jits might sometimes generate bad GC info. Mysterious intermittent ... — committed to dotnet/runtime by AndyAyersMS 4 years ago
- JIT: ensure fgFirstBB has appropriate flags (#40038) This fixes an issue where release jits might sometimes generate bad GC info. Keeping it minimal for now so we can consider servicing preview 8. ... — committed to Jacksondr5/runtime by AndyAyersMS 4 years ago
+1 thanks @dellamonica for helping track this down and fix it before final release. Much appreciated.
Think I’ve finally figured this one out.
Recall the release jit only sometimes generates bad GC info, and the checked jit never does.
The release jit can sometimes end up in a situation where
fgFirstBB
does not have theBBF_HAS_LABEL
orBBF_JMP_TARGET
flags set. If this happens and there are GC references live into the method body, they won’t be reported properly as the jit won’t create a label for the first BB.A checked jit will typically always set
BBF_HAS_LABEL
flag onfgBirstBB
because of this bit of code:https://github.com/dotnet/runtime/blob/50a999dd3597a33b62a7a1a1712c849e764d50e9/src/coreclr/src/jit/codegenlinear.cpp#L99-L102
It turns out
fgHasSwitch
is never explicitly initialized, so it tends to be set to true in checked builds, given the default fill pattern (the only fill pattern that would have exposed this error in a checked jit is00
which currently can’t be used).In release builds,
fgHasSwitch
will have a somewhat random value in methods that don’t have switches. Apparently a random0
tends to be rare, so most of the time the release jit will also set theBBF_HAS_LABEL
flag onfgBirstBB
. But occasionally the value is0
and the release jit fails to create a label for the first BB, and thus leaves the jit in position to generate bad GC info.The above explains why checked jits always produce good GC info, and release jits mostly always do.
This is a regression; the precipitating change is quite likely #1309. Before that change it wasn’t possible (or at least wasn’t common) for the jit to create a scratch BB after building pred lists. The order of these is significant;
fgComputePreds
sets bothBBF_HAS_LABEL
andBBF_JMP_TARGET
flags on the first block. Post #1309 the jit might create a scratch BB afterfgComputePreds
has run, and in that case the new first BB does not have the right flags set.The simple minimal fix is to initialize
fgHasSwitch
to false infgInit
, and then updatefgEnsureFirstBBisScratch
to set the necessary flags on the new first BB.I have sent an updated jit to @dellamonica to test; hopefully this is indeed the problem.
@dellamonica thanks for reporting this, and for your help and patience in tracking this down.