runtime: .NET Core 3.1 - AsmMacros.inc - memory operand not aligned by 16 causing Access violation (0xc0000005)

Description

The issue was observed twice on .NET Core 3.1.10 (out of >1000 repro attempts). We have captured two memory dumps and used them to pinpoint the source of the issue, which seems to be an incorrect usage of movdqa assembly instruction in AsmMacros.inc for operand that is not divisible by 16.

image

image

According to the definition of the assembly instruction movdqa (see here and here), the memory operand should be aligned by 16. As can be seen, in both cases the operand is not divisible by 16, which triggers an Access violation (0xc0000005).

For reference, here is the a link to the source code containing this assembly instruction: https://github.com/dotnet/coreclr/blob/b4f19e3e849044ffe4feb9f7788edda9a129a773/src/vm/amd64/AsmMacros.inc#L78

Reproduction Steps

This issue is very hard to reproduce, only 2 of our 1000+ attempts reproduced the issue.

Expected behavior

We expect that the process would not crash

Actual behavior

The process crash due to Access violation (0xc0000005)

Regression?

We have migrated from .NET Framework 4.7.2 to .NET Core 3.1.10, the issue did not occur under .NET Framework 4.7.2

Known Workarounds

There isn’t any workaround as far as I can tell

Configuration

.NET Core 3.1.10, Windows Server 2012 R2, 64-bit process.

Other information

No response

About this issue

  • Original URL
  • State: open
  • Created 2 years ago
  • Comments: 18 (17 by maintainers)

Most upvoted comments

I’m not seeing what we could be using that would have such effect on registers.

The registers are not the problem. The problem is that the stack memory is overwritten. It is probably getting overwritten by some other thread.

Here is a program that demonstrates what is likely happening. The program is going to crash with fatal errors and no way to tell how we got there. There are no breadcrumbs that would connect the crash with OverwriteStackAsynchronously.

OverwriteStackAsynchronously();

for (;;) Console.WriteLine("Hello world!");

unsafe void OverwriteStackAsynchronously()
{
    int* p = GetStackPointer();

    Task.Run(() => {
        for (;;) for (int i = 0; i < 100; i++) *(p-i) = i;
    });

    static unsafe int* GetStackPointer()
    {
        int x = 0;
        return &x;
    }
}

Is there a way to get additional information in order to pin-point the source of the issue?

One option is to try to catch the crash using TTD https://docs.microsoft.com/en-us/windows-hardware/drivers/debugger/time-travel-debugging-overview , but it may be too time consuming given that the issue is hard to reproduce.

Do you have any logging in your app? Can you see any patterns in what the app was doing that preceded this crash?

we did not have any issue before the migration from .NET Framework to .NET Core 3.1.

One potential explanation of why you have not seen it in .NET Framework is that .NET Framework was slower and so the stack was overwritten too late to actually cause damage.