PSReadLine: Simultaneous pwsh instance start Mutex.CreateMutexCore IOException race condition
Environment
PSReadLine: 2.1.0
PowerShell: 7.1.3
OS: Darwin 20.5.0 Darwin Kernel Version 20.5.0: Sat May 8 05:10:33 PDT 2021; root:xnu-7195.121.3~9/RELEASE_X86_64
BufferWidth: 176
BufferHeight: 91
Exception report
iTerm session 3:
System.IO.IOException: The system cannot open the device or file specified. : 'PSReadLineHistoryFile_2886463743'
at System.Threading.Mutex.CreateMutexCore(Boolean initiallyOwned, String name, Boolean& createdNew)
at System.Threading.Mutex..ctor(Boolean initiallyOwned, String name)
at Microsoft.PowerShell.PSConsoleReadLine.DelayedOneTimeInitialize()
at Microsoft.PowerShell.PSConsoleReadLine.Initialize(Runspace runspace, EngineIntrinsics engineIntrinsics)
at Microsoft.PowerShell.PSConsoleReadLine.ReadLine(Runspace runspace, EngineIntrinsics engineIntrinsics, CancellationToken cancellationToken)
iTerm session 4:
System.IO.IOException: The system cannot open the device or file specified. : 'PSReadLineHistoryFile_2886463743'
at System.Threading.Mutex.CreateMutexCore(Boolean initiallyOwned, String name, Boolean& createdNew)
at System.Threading.Mutex..ctor(Boolean initiallyOwned, String name)
at Microsoft.PowerShell.PSConsoleReadLine.DelayedOneTimeInitialize()
at Microsoft.PowerShell.PSConsoleReadLine.Initialize(Runspace runspace, EngineIntrinsics engineIntrinsics)
at Microsoft.PowerShell.PSConsoleReadLine.ReadLine(Runspace runspace, EngineIntrinsics engineIntrinsics, CancellationToken cancellationToken)
Steps to reproduce
(Steps that shouldn’t matter: My macbook was asleep in clamshell mode, I unplugged the display connections, and the OS crashed. I logged in and submitted the bug report to Apple, again.)
I saw this message on two tabs when iTerm2 was trying to restore a terminal window with four tab. The first two tabs did not have any error.
iTerm session restore caused 4 pwsh instances to launch at the same time, and got these two crashes from different pwsh instances.
Expected behavior
Launching multiple pwsh instances at the exact same time should not show error.
Actual behavior
Tabs 3 and 4 each showed the exception message above.
Running some debug commands, there are a lot of PSReadLineHistoryFile_2886463743
files created at nearly the exact same millisecond:
➜ ~ gci -r /tmp/.dotnet/shm
Directory: /tmp/.dotnet/shm
Mode LastWriteTime Length Name
---- ------------- ------ ----
d---- 7/8/2021 10:59 AM session1999
d---- 7/8/2021 12:55 PM session5595
d---- 7/8/2021 10:15 AM session705
d---- 7/8/2021 10:15 AM session710
d---- 7/8/2021 10:15 AM session715
d---- 7/8/2021 10:15 AM session721
Directory: /tmp/.dotnet/shm/session715
Mode LastWriteTime Length Name
---- ------------- ------ ----
----- 7/8/2021 10:15 AM 4096 PSReadLineHistoryFile_2886463743
Directory: /tmp/.dotnet/shm/session705
Mode LastWriteTime Length Name
---- ------------- ------ ----
----- 7/8/2021 10:15 AM 4096 PSReadLineHistoryFile_2886463743
Directory: /tmp/.dotnet/shm/session5595
Mode LastWriteTime Length Name
---- ------------- ------ ----
----- 7/8/2021 12:44 PM 4096 PSReadLineHistoryFile_2886463743
----- 7/8/2021 12:55 PM 4096 PSReadLineHistoryFile_3946503603
Directory: /tmp/.dotnet/shm/session1999
Mode LastWriteTime Length Name
---- ------------- ------ ----
----- 7/8/2021 10:59 AM 4096 PSReadLineHistoryFile_2886463743
Directory: /tmp/.dotnet/shm/session721
Mode LastWriteTime Length Name
---- ------------- ------ ----
----- 7/8/2021 10:15 AM 4096 PSReadLineHistoryFile_2886463743
Directory: /tmp/.dotnet/shm/session710
Mode LastWriteTime Length Name
---- ------------- ------ ----
----- 7/8/2021 10:15 AM 4096 PSReadLineHistoryFile_2886463743
➜ ~ gci -r -file /tmp/.dotnet/shm | % { $_.LastWriteTime.ToString("MM/dd/yyyy hh:mm:ss.ffffff tt") } | sort-object
07/08/2021 10:15:57.403067 AM
07/08/2021 10:15:57.403165 AM
07/08/2021 10:15:57.425071 AM
07/08/2021 10:15:57.425950 AM
07/08/2021 10:59:23.698350 AM
07/08/2021 12:44:43.666927 PM
07/08/2021 12:55:35.784985 PM
My 2 cents:
- This is not a duplicate of #1464. My macOS permissions on /private/tmp are fine, and manually opening a pwsh instances always works.
- Changing the exception message in dotnet/runtime#36823 won’t fix the problem of not showing exception message.
- Whatever code that is in charge of picking mutex names is using the same mutex name.
- On Windows, I’ve seen
System.Threading.Mutex
throw occasionally if the mutex name is contended.- The .ctor() documentation says
IOException
can happen when some other error happens.
- The .ctor() documentation says
- Maybe pwsh could make the mutex name more unique across instances?
- Maybe pwsh could use a retry loop around mutex code that sometimes throws?
About this issue
- Original URL
- State: open
- Created 3 years ago
- Comments: 20 (7 by maintainers)
Ok, I ran another experiment with the upgrade to macOS 12.6.1 and I was able to repro this failure without pwsh or PSReadLine; if we ask the owner of dotnet BCL what the Muetx exception policy should be I hope this added context would help.
Repo
Program.cs
with anet6.0
console app:Built with
dotnet publish -c Release --self-contained -a x64
I created a new iTerm profile that just runs command
/Users/walshca/code/temp/MutexThrow/bin/Release/net6.0/osx-x64/publish/MutexThrow
and opened 30 tabs with this. Then I upgraded the macOS version to 12.6.1, and after reboot only 1 tab had the exception:(I expected about a quarter, so something about this minimal program doesn’t match what pwsh is doing).
I’ll open some more tabs of this, and leave it running for a few more OS reboots to see if I can repro it without OS upgrade.
@darthwalsh and @iSazonov Thank you both for the investigation so far. I will need to get a macOS to try reproducing the issue. A couple clarification:
IOException
, we have no idea what causes it and thus I don’t want to blindly add the retry logic.I’d love to understand the root cause before making any changes in the code.