Half-way through waiting for remote process.
Memory load: 14
Image Name PID Services
========================= ======== ============================================
svchost.exe 620 BrokerInfrastructure, DcomLaunch, LSM,
PlugPlay, Power, SystemEventsBroker
svchost.exe 652 RpcEptMapper, RpcSs
svchost.exe 776 Dhcp, EventLog, lmhosts, vmictimesync,
Wcmsvc
svchost.exe 808 AeLookupSvc, Appinfo, BITS, CertPropSvc,
gpsvc, iphlpsvc, LanmanServer, ProfSvc,
sacsvr, Schedule, SENS, SessionEnv,
ShellHWDetection, Themes, Winmgmt
svchost.exe 888 EventSystem, FontCache, netprofm, nsi,
W32Time, WinHttpAutoProxySvc
svchost.exe 976 CryptSvc, Dnscache, LanmanWorkstation,
NlaSvc, WinRM
svchost.exe 592 BFE, DPS, MpsSvc, pla
svchost.exe 1136 AppHostSvc
svchost.exe 1164 DiagTrack
svchost.exe 1328 TrkWks, UALSVC, UmRdpService,
vmickvpexchange, vmicshutdown, vmicvss
svchost.exe 1352 W3SVC, WAS
svchost.exe 2632 TermService
svchost.exe 2672 vmicheartbeat, vmicrdv
Timed out at 4/25/2020 1:55:40 AM after 60000ms waiting for remote process.
Wrote mini dump to: C:\h\w\AE3609A5\w\B05309CA\uploads\3476.zuzob0j0.l20.dmp
Process ID: 3476
Handle: 1004
Name: dotnet
MainModule: C:\h\w\AE3609A5\p\dotnet.exe
StartTime: 4/25/2020 1:54:40 AM
TotalProcessorTime: 00:00:00.2343750
Threads:
Thread #1 (OS 0x560)
[InlinedCallFrame] (System.Environment._Exit)
[InlinedCallFrame] (System.Environment._Exit)
System.Environment.Exit(Int32)
Microsoft.DotNet.RemoteExecutor.Program.Main(System.String[])
Thread #2 (OS 0xA38) [Finalizer] [Background] [MTA]
[InlinedCallFrame] (Interop+Advapi32.EventUnregister)
[InlinedCallFrame] (Interop+Advapi32.EventUnregister)
System.Diagnostics.Tracing.EtwEventProvider.System.Diagnostics.Tracing.IEventProvider.EventUnregister(Int64)
System.Diagnostics.Tracing.EventProvider.EventUnregister(Int64)
System.Diagnostics.Tracing.EventProvider.Dispose(Boolean)
System.Diagnostics.Tracing.EventProvider.Dispose()
System.Diagnostics.Tracing.EventSource.Dispose(Boolean)
System.Diagnostics.Tracing.EventListener.DisposeOnShutdown(System.Object, System.EventArgs)
System.AppContext.OnProcessExit()
Thread #3 (OS 0x700) [Thread pool worker] [Background] [MTA]
Thread #4 (OS 0xE50) [Thread pool worker] [Background] [MTA]
I have stepped through the Event Pipe IPC shutdown sequence. I believe that there is a clear HANDLE use-after-close bug:
Handle that is being closed here:
is being waited on - or about to be waited on - here:
How does this explain the hang:
Why is it more likely to repro on Win7/8:
The shutdown sequence of the event pipe need careful review. Also, it would be a good idea to minimize work done during shutdown to only what’s absolutely necessary, e.g. these event handles do not really need to be closed since the OS will close them anyway as part of the process shutdown.
Reverting in #35767
Can we revert the offending PR? This is causing at least one leg in the majority of PRs to fail. It’s going to make it much more likely that other regressions slip in, as we’re having to rerun lots of legs and merge with failures that might be masking other failures, and it’s going to make it more likely devs assume that failures aren’t theirs even if they are in fact caused by their PR. Thanks!
The dump at
\\jkotas9\drops\issue-35451confirms this:pHandles[0]on thread 3 is0x168,CriticalSection.LockSemaphoreon thread 2 is0x168By “ping” I meant that it literally made a ping sound from the console. I had stepped away from my computer for a while, and it started beeping. It seems like after 1000 loops most of the runs are taking >10 seconds! The xunit output claims the tests ran in 5 seconds (still a big increase over the 0.6 second runtime from the first 900 loops), but the
RuntTests.cmdscript didn’t finish before 10 seconds. The logs also say that the tests passed. I’ll dig into this more tomorrow with some better logging and see if it actually means anything (and mute my computer before I call it a day 😝 ).Interesting!
I’ll take a look at this today. Thanks for triaging things @safern! I’m not sure what would be causing this issue, but based on the versions being hit, I’m guessing it might be something with Overlapped IO in old versions of Windows. I’ll have to do some investigating.
Attaching windbg showed the loader lock was unowned:
Best guess the loader lock was owned earlier causing the OS to switch the thread out, later the lock was released and for whatever reason the OS hasn’t yet switched the thread back in.
Jan’s suggestion of #13610 sounds very plausible to me. If we wanted more data on this failure (or failures of this class) we’d probably need to upgrade the telemetry to determine if the thread is making progress and what behavior is happening machine wide. A better set of telemetry for a suspected hung test to capture would be:
Closing this as duplicate of #13610