runtime: SequenceEqual broken for concurrent use?

Repro:

using System;
using System.Security.Cryptography;
using System.Threading;

class Program
{
    static void Main()
    {
        static void Work(object _)
        {
            while (true)
            {
                Span<byte> buffer1 = RandomNumberGenerator.GetBytes(1024 * 1024);
                Span<byte> buffer2 = new byte[buffer1.Length];
                buffer1.CopyTo(buffer2);

                if (!buffer1.SequenceEqual(buffer2)) // should never be false...
                {
                    while (!buffer1.SequenceEqual(buffer2)) ; // but if it is, it certainly shouldn't go back to true...
                    throw new Exception("What in the world just happened?!");
                }
            }
        }

        ThreadPool.QueueUserWorkItem(Work);
        ThreadPool.QueueUserWorkItem(Work);
        Console.ReadLine();
    }
}

This runs without error on .NET 5.

On master on my Windows machine (I’ve not tried other OSes), within a few seconds I get a failure like this:

Unhandled exception. System.Exception: SequenceEquals changed from false to true?!
   at Program.Work() in C:\Users\stoub\source\repos\ConsoleApp17\ConsoleApp17\Program.cs:line 25
   at Program.<>c.<Main>b__0_0(Object _) in C:\Users\stoub\source\repos\ConsoleApp17\ConsoleApp17\Program.cs:line 10
   at System.Threading.QueueUserWorkItemCallbackDefaultContext.Execute() in D:\repos\runtime\src\libraries\System.Private.CoreLib\src\System\Threading\ThreadPool.cs:line 977
   at System.Threading.ThreadPoolWorkQueue.Dispatch() in D:\repos\runtime\src\libraries\System.Private.CoreLib\src\System\Threading\ThreadPool.cs:line 705
   at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart() in D:\repos\runtime\src\libraries\System.Private.CoreLib\src\System\Threading\PortableThreadPool.WorkerThread.cs:line 56
   at System.Threading.Thread.StartCallback() in D:\repos\runtime\src\coreclr\System.Private.CoreLib\src\System\Threading\Thread.CoreCLR.cs:line 105

Note that if I remove one of the ThreadPool calls so that there’s no concurrency, it’s never failed for me; only once I introduce the parallelism does it start to fail.

JIT bug? GC? Some kind of latent issue in SequenceEquals?

cc: @geoffkizer, @jkotas

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 44 (43 by maintainers)

Most upvoted comments

I think any CLR version that supports AVX intrinsics and still receives servicing fixes may need to get this fix at some point.

New issue would be better for servicing. I will create one for 5.0, and can keep that around for more down-level ports if required

I do not think anybody is working on it. It would great if you can pick it up. Thank you!

Any idea why this just started showing up?

I think it got triggered by memcpy changes in CRT. Are you on VS dogfood or have you installed VS update recently?

memcpy - shipping .NET 5.0 bits:

0:010> u coreclr!memcpy
coreclr!memcpy [D:\agent\_work\9\s\src\vctools\crt\vcruntime\src\string\amd64\memcpy.asm @ 129]:
00007fff`50a9d7b0 4c8bd9          mov     r11,rcx
00007fff`50a9d7b3 4c8bd2          mov     r10,rdx
00007fff`50a9d7b6 4983f810        cmp     r8,10h
00007fff`50a9d7ba 7654            jbe     coreclr!memcpy+0x60 (00007fff`50a9d810)
00007fff`50a9d7bc 4983f820        cmp     r8,20h
00007fff`50a9d7c0 762e            jbe     coreclr!memcpy+0x40 (00007fff`50a9d7f0)
00007fff`50a9d7c2 482bd1          sub     rdx,rcx
00007fff`50a9d7c5 730d            jae     coreclr!memcpy+0x24 (00007fff`50a9d7d4)

memcpy - master compiled using latest VS dogfood (Version 16.9.0 Preview 4.0 [30914.28.main])

0:011> u coreclr!memcpy
CoreCLR!memcpy [d:\agent\_work\31\s\src\vctools\crt\vcruntime\src\string\amd64\memcpy.asm @ 68]:
00007fff`455838f0 488bc1          mov     rax,rcx
00007fff`455838f3 4c8d1506c774ff  lea     r10,[CoreCLR!__acrt_rg_language_count (00007fff`44cd0000)]
00007fff`455838fa 4983f80f        cmp     r8,0Fh
00007fff`455838fe 0f870c010000    ja      CoreCLR!memcpy+0x120 (00007fff`45583a10)
00007fff`45583904 666666660f1f840000000000 nop word ptr [rax+rax]
00007fff`45583910 478b8c8210d2ca00 mov     r9d,dword ptr [r10+r8*4+0CAD210h]
00007fff`45583918 4d03ca          add     r9,r10
00007fff`4558391b 41ffe1          jmp     r9
...
uses AVX 
...

Any idea why this just started showing up?

I am still unable to repro. Likely requires newest C runtime/compiler.

Looks like https://github.com/dotnet/runtime/issues/38974 finally caught up with us.

You may need to be on latest VS dogfood to see the crash and/or have processor without the ERMSB feature.

Here is what’s happening:

  1. Fully interruptible thread suspend happens to stop at this place in SpanHelpers.SequenceEqual:
00007ff9`c895aae7 c5fd10042e      vmovupd ymm0,ymmword ptr [rsi+rbp]
00007ff9`c895aaec c5fd100c2f      vmovupd ymm1,ymmword ptr [rdi+rbp] <--- Here
00007ff9`c895aaf1 c5fd74c1        vpcmpeqb ymm0,ymm0,ymm1
00007ff9`c895aaf5 c5fdd7c8        vpmovmskb ecx,ymm0
00007ff9`c895aaf9 83f9ff          cmp     ecx,0FFFFFFFFh
00007ff9`c895aafc 0f85e5000000    jne     System_Private_CoreLib!System.SpanHelpers.SequenceEqual(Byte ByRef, Byte ByRef, UIntPtr)+0xffffffff`a1614a47 (00007ff9`c895abe7)
  1. The thread suspend helper happens to call memcpy at:
CoreCLR!memcpy+0x180 [d:\agent\_work\31\s\src\vctools\crt\vcruntime\src\string\amd64\memcpy.asm @ 300] 
CoreCLR!Thread::RedirectedHandledJITCase+0x27f [C:\runtime\src\coreclr\vm\threadsuspend.cpp @ 2708] 
CoreCLR!RedirectedHandledJITCaseForGCThreadControl_Stub+0x26 [C:\runtime\src\coreclr\vm\amd64\RedirectedHandledJITCase.asm @ 97] 
System_Private_CoreLib!System.SpanHelpers.SequenceEqual(Byte ByRef, Byte ByRef, UIntPtr)+0xffffffff`a161494c [C:\runtime\src\libraries\System.Private.CoreLib\src\System\SpanHelpers.Byte.cs @ 1676] 

memcpy is optimized using AVX that modifies ymm0 register as side-effect:

00007ffa`28523a6c c5fe6f02        vmovdqu ymm0,ymmword ptr [rdx]
00007ffa`28523a70 c4a17e6f6c02e0  vmovdqu ymm5,ymmword ptr [rdx+r8-20h]
00007ffa`28523a77 4981f800010000  cmp     r8,100h
00007ffa`28523a7e 0f86c4000000    jbe     CoreCLR!memcpy+0x258 (00007ffa`28523b48)
  1. Nothing restores ymm0 to its original value when the thread resumes due to #38974. It makes SequenceEqual to return a bogus result.

Seconds on my machine, using src from master this afternoon.

This looks like a thread suspension bug to me. @VSadov Could you please take a look since you have done refactoring of that code recently?