PSReadLine: Simultaneous pwsh instance start Mutex.CreateMutexCore IOException race condition

Environment

PSReadLine: 2.1.0
PowerShell: 7.1.3
OS: Darwin 20.5.0 Darwin Kernel Version 20.5.0: Sat May  8 05:10:33 PDT 2021; root:xnu-7195.121.3~9/RELEASE_X86_64
BufferWidth: 176
BufferHeight: 91

Exception report

iTerm session 3:

System.IO.IOException: The system cannot open the device or file specified. : 'PSReadLineHistoryFile_2886463743'
   at System.Threading.Mutex.CreateMutexCore(Boolean initiallyOwned, String name, Boolean& createdNew)
   at System.Threading.Mutex..ctor(Boolean initiallyOwned, String name)
   at Microsoft.PowerShell.PSConsoleReadLine.DelayedOneTimeInitialize()
   at Microsoft.PowerShell.PSConsoleReadLine.Initialize(Runspace runspace, EngineIntrinsics engineIntrinsics)
   at Microsoft.PowerShell.PSConsoleReadLine.ReadLine(Runspace runspace, EngineIntrinsics engineIntrinsics, CancellationToken cancellationToken)

iTerm session 4:

System.IO.IOException: The system cannot open the device or file specified. : 'PSReadLineHistoryFile_2886463743'
   at System.Threading.Mutex.CreateMutexCore(Boolean initiallyOwned, String name, Boolean& createdNew)
   at System.Threading.Mutex..ctor(Boolean initiallyOwned, String name)
   at Microsoft.PowerShell.PSConsoleReadLine.DelayedOneTimeInitialize()
   at Microsoft.PowerShell.PSConsoleReadLine.Initialize(Runspace runspace, EngineIntrinsics engineIntrinsics)
   at Microsoft.PowerShell.PSConsoleReadLine.ReadLine(Runspace runspace, EngineIntrinsics engineIntrinsics, CancellationToken cancellationToken)

Steps to reproduce

(Steps that shouldn’t matter: My macbook was asleep in clamshell mode, I unplugged the display connections, and the OS crashed. I logged in and submitted the bug report to Apple, again.)

I saw this message on two tabs when iTerm2 was trying to restore a terminal window with four tab. The first two tabs did not have any error.

iTerm session restore caused 4 pwsh instances to launch at the same time, and got these two crashes from different pwsh instances.

Expected behavior

Launching multiple pwsh instances at the exact same time should not show error.

Actual behavior

Tabs 3 and 4 each showed the exception message above.


Running some debug commands, there are a lot of PSReadLineHistoryFile_2886463743 files created at nearly the exact same millisecond:

➜ ~ gci -r /tmp/.dotnet/shm

    Directory: /tmp/.dotnet/shm

Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
d----            7/8/2021 10:59 AM                session1999
d----            7/8/2021 12:55 PM                session5595
d----            7/8/2021 10:15 AM                session705
d----            7/8/2021 10:15 AM                session710
d----            7/8/2021 10:15 AM                session715
d----            7/8/2021 10:15 AM                session721

    Directory: /tmp/.dotnet/shm/session715

Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
-----            7/8/2021 10:15 AM           4096 PSReadLineHistoryFile_2886463743

    Directory: /tmp/.dotnet/shm/session705

Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
-----            7/8/2021 10:15 AM           4096 PSReadLineHistoryFile_2886463743

    Directory: /tmp/.dotnet/shm/session5595

Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
-----            7/8/2021 12:44 PM           4096 PSReadLineHistoryFile_2886463743
-----            7/8/2021 12:55 PM           4096 PSReadLineHistoryFile_3946503603

    Directory: /tmp/.dotnet/shm/session1999

Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
-----            7/8/2021 10:59 AM           4096 PSReadLineHistoryFile_2886463743

    Directory: /tmp/.dotnet/shm/session721

Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
-----            7/8/2021 10:15 AM           4096 PSReadLineHistoryFile_2886463743

    Directory: /tmp/.dotnet/shm/session710

Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
-----            7/8/2021 10:15 AM           4096 PSReadLineHistoryFile_2886463743
➜ ~ gci -r -file /tmp/.dotnet/shm | % { $_.LastWriteTime.ToString("MM/dd/yyyy hh:mm:ss.ffffff tt") } | sort-object
07/08/2021 10:15:57.403067 AM
07/08/2021 10:15:57.403165 AM
07/08/2021 10:15:57.425071 AM
07/08/2021 10:15:57.425950 AM
07/08/2021 10:59:23.698350 AM
07/08/2021 12:44:43.666927 PM
07/08/2021 12:55:35.784985 PM

My 2 cents:

  • This is not a duplicate of #1464. My macOS permissions on /private/tmp are fine, and manually opening a pwsh instances always works.
    • Changing the exception message in dotnet/runtime#36823 won’t fix the problem of not showing exception message.
  • Whatever code that is in charge of picking mutex names is using the same mutex name.
  • On Windows, I’ve seen System.Threading.Mutex throw occasionally if the mutex name is contended.
    • The .ctor() documentation says IOException can happen when some other error happens.
  • Maybe pwsh could make the mutex name more unique across instances?
  • Maybe pwsh could use a retry loop around mutex code that sometimes throws?

About this issue

  • Original URL
  • State: open
  • Created 3 years ago
  • Comments: 20 (7 by maintainers)

Most upvoted comments

Ok, I ran another experiment with the upgrade to macOS 12.6.1 and I was able to repro this failure without pwsh or PSReadLine; if we ask the owner of dotnet BCL what the Muetx exception policy should be I hope this added context would help.


Repo Program.cs with a net6.0 console app:

try {
  using (var m = new Mutex(false, "darthwalsh_PSReadLine_issues_2658")) {
    Console.Write("created but didn't take Mutex darthwalsh_PSReadLine_issues_2658   ...");
    var line = Console.ReadLine();
    Console.WriteLine("QUITTING: " + line);
  }
} catch (Exception e) {
  Console.WriteLine(e.ToString());

  var line = Console.ReadLine();
  Console.WriteLine("QUITTING: " + line);
}

Built with dotnet publish -c Release --self-contained -a x64

I created a new iTerm profile that just runs command /Users/walshca/code/temp/MutexThrow/bin/Release/net6.0/osx-x64/publish/MutexThrow and opened 30 tabs with this. Then I upgraded the macOS version to 12.6.1, and after reboot only 1 tab had the exception:

System.IO.IOException: The system cannot open the device or file specified. : 'darthwalsh_PSReadLine_issues_2658'
   at System.Threading.Mutex.CreateMutexCore(Boolean initiallyOwned, String name, Boolean& createdNew)
   at System.Threading.Mutex..ctor(Boolean initiallyOwned, String name)
   at Program.<Main>$(String[] args) in /Users/walshca/code/temp/MutexThrow/Program.cs:line 2

(I expected about a quarter, so something about this minimal program doesn’t match what pwsh is doing).

I’ll open some more tabs of this, and leave it running for a few more OS reboots to see if I can repro it without OS upgrade.

@darthwalsh and @iSazonov Thank you both for the investigation so far. I will need to get a macOS to try reproducing the issue. A couple clarification:

  1. the mutex is to guard the read/write access of the history file, which is shared by all pwsh instances for a given user.
  2. the re-try we have is for abandoned mutex, which we understand why a retry is needed in that case – the thread owns the mutex, however, it was abandoned by the previous owner thread, so releasing and then retry is needed. But for this IOException, we have no idea what causes it and thus I don’t want to blindly add the retry logic.

I’d love to understand the root cause before making any changes in the code.