runtime: .NET Core 3.1 - AsmMacros.inc - memory operand not aligned by 16 causing Access violation (0xc0000005)
Description
The issue was observed twice on .NET Core 3.1.10 (out of >1000 repro attempts).
We have captured two memory dumps and used them to pinpoint the source of the issue, which seems to be an incorrect usage of movdqa assembly instruction in AsmMacros.inc for operand that is not divisible by 16.


According to the definition of the assembly instruction movdqa (see here and here), the memory operand should be aligned by 16.
As can be seen, in both cases the operand is not divisible by 16, which triggers an Access violation (0xc0000005).
For reference, here is the a link to the source code containing this assembly instruction: https://github.com/dotnet/coreclr/blob/b4f19e3e849044ffe4feb9f7788edda9a129a773/src/vm/amd64/AsmMacros.inc#L78
Reproduction Steps
This issue is very hard to reproduce, only 2 of our 1000+ attempts reproduced the issue.
Expected behavior
We expect that the process would not crash
Actual behavior
The process crash due to Access violation (0xc0000005)
Regression?
We have migrated from .NET Framework 4.7.2 to .NET Core 3.1.10, the issue did not occur under .NET Framework 4.7.2
Known Workarounds
There isn’t any workaround as far as I can tell
Configuration
.NET Core 3.1.10, Windows Server 2012 R2, 64-bit process.
Other information
No response
About this issue
- Original URL
- State: open
- Created 2 years ago
- Comments: 18 (17 by maintainers)
The registers are not the problem. The problem is that the stack memory is overwritten. It is probably getting overwritten by some other thread.
Here is a program that demonstrates what is likely happening. The program is going to crash with fatal errors and no way to tell how we got there. There are no breadcrumbs that would connect the crash with
OverwriteStackAsynchronously.One option is to try to catch the crash using TTD https://docs.microsoft.com/en-us/windows-hardware/drivers/debugger/time-travel-debugging-overview , but it may be too time consuming given that the issue is hard to reproduce.
Do you have any logging in your app? Can you see any patterns in what the app was doing that preceded this crash?
One potential explanation of why you have not seen it in .NET Framework is that .NET Framework was slower and so the stack was overwritten too late to actually cause damage.