runtime: Possible deadlock in ConfigurationManager in .NET 6

Description

We have a production HTTP application that we updated recently to .NET 6.0.0 from .NET 6.0.0-rc.2 and have observed a number of issues where the application appeared to be become suddenly unresponsive to HTTP requests. This would cause application health checks to fail, and the instances to be taken out of service.

Having dug into this over the last day or so (https://github.com/dotnet/runtime/issues/60654#issuecomment-970302827), I think I’ve tracked this down to a deadlock that occurs in ConfigurationManager if the application’s configuration is manually reloaded at runtime.

Overall the issue appears to be that if an options class is bound to configuration via a type such as IOptionsMonitor<T> and there is a change callback bound to IConfigurationRoot.Reload(), then the application will deadlock trying to get configuration values to bind to the options class as the lock around getting an option’s value:

https://github.com/dotnet/runtime/blob/13024af94f951851d9cee9a7d79911749a25fa3b/src/libraries/Microsoft.Extensions.Configuration/src/ConfigurationManager.cs#L46

will be waiting for the lock acquired during the reload:

https://github.com/dotnet/runtime/blob/13024af94f951851d9cee9a7d79911749a25fa3b/src/libraries/Microsoft.Extensions.Configuration/src/ConfigurationManager.cs#L109

I’ve captured a memory dump from the application after triggering the issue in our staging environment, and a screenshot of the Parallel Stacks window from Visual Studio taken from inspecting the memory dump is below.

deadlock

Thread 852 has called IConfigurationRoot.Reload(), which is blocked on thread 3516 waiting on an options monitor callback for an options class.

Thread 3516 is deadlocked on a call to IConfiguration[string] to create an options class.

IConfiguration and IConfigurationRoot are both the same instance of ConfigurationManager.

~I haven’t ruled out this being a latent bug in our application that .NET 6 has brought to the surface, but we’ve only had the issue with .NET 6.0.0. We’ve reverted the application to .NET 6.0.0-rc2 for the time being, and the problem has gone away.~

~I figured I would log the issue now in case someone looks at it and can quickly find the root cause while I’m continuing to repro this independently or determine it’s an actual bug in our app.~

Reproduction Steps

To reproduce this issue, follow the instructions in this repo: https://github.com/martincostello/ConfigurationManagerDeadlock

~A conceptual repro is to do the following two actions concurrently in an app using WebApplicationBuilder so ConfigurationManager is the app’s IConfigurationRoot:~

  1. ~Reload the IConfigurationRoot from an HTTP request in a loop;~
  2. ~Issue an HTTP request that resolves IOptionsMonitor<T> or IOptionsSnapshot<T> from the service provider which is bound to configuration in a loop.~

~After a period of time (in testing I found this happened within 10 minutes), the application will deadlock.~

Expected behavior

Configuration reloads successfully and does not deadlock requests in flight.

Actual behavior

The application deadlocks the thread reloading the configuration and other threads accessing the configuration to bind options.

Regression?

Compared to using IConfigurationRoot directly with Program/Startup (i.e. a non-Minimal API), yes.

Known Workarounds

I’m not aware of any workarounds at this point, other than not using Minimal APIs when doing configuration reloading at runtime.

Configuration

  • .NET 6.0.0-rtm.21522.10
  • Microsoft Windows 10.0.17763
  • Runtime version 6.0.0-rtm.21522.10+4822e3c3aa77eb82b2fb33c9321f923cf11ddde6
  • ASP.NET Core version 6.0.0+ae1a6cbe225b99c0bf38b7e31bf60cb653b73a52

Other information

No response

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 3
  • Comments: 24 (24 by maintainers)

Most upvoted comments

I opened a backport PR at #63816. If you’re aware of any others who have run into this issue, that might make it easier pass the servicing bar.

@halter73 I deployed a .NET 7 build of our application with the fix to our staging environment today for 2 hours, and there were no functional or performance issues observed.

Our load test test sends constant synthetic load at the application, plus I additionally fired requests at it to reload the configuration continuously in a loop for an hour. We didn’t observe any deadlocks during the period, compared to with .NET 6 where we could reproduce the deadlock in these circumstances within a few minutes.

Sure, I’ll try this out tomorrow in our staging environment.

Yep, the workaround works for our use case. Thanks!

Thanks for the great repro @martincostello . One possible workaround for now if you control the code calling Reload() would be to change the call to Reload() to the following:

foreach (var provider in ((IConfigurationRoot)Configuration).Providers)
{
    provider.Load();
}

This isn’t exactly the same as Reload() since it doesn’t trigger the reload token directly. But if the providers trigger their own reload tokens, it should be fairly equivalent.

Nice investigation, Stu. I wonder through if maybe there’s something in the lock implementation that should be reworked in ConfigurationManager though.

The class summary says it should be frozen once Build() is called, but it doesn’t seem to do that. Also having all access to the configuration vales guarded by a lock seems like a performance issue to me for an app just using vanilla IConfiguration.

Any thoughts on this @halter73 ?

I’ve been having a play with this, the repo is super useful to recreate this.

There are 2 resources involved in this deadlock:

  • ConfigurationManager ._providerLock object field, this object is used to guard around most ConfigurationManager methods.
  • Lazy<MyOptions> stored in OptionsCache. The .Value property uses the default ExecutionAndPublication behavior, where the first thread will create the value, and other threads will wait for that value to become available.

The deadlock

  • /reload route ConfigurationManager.Reload()lock (_providerLock)OptionsCacheLazy<MyOptions>.Value + wait
  • /value route OptionsCacheLazy<MyOptions>.Value + execute → ConfigurationManager["MyOptions"]lock (_providerLock)

I presume the reload path is fine when the Lazy<MyOptions>.Value is executed on that thread as it will be on the same thread and the lock is reentrant.

Perhaps the Lazy<TOptions> in OptionsCache should use the PublicationOnly behaviour on this line, the advantage being there no threads will be waiting when the value is uninitialized, so no deadlock, but the disadvantage is that we’d potentially be creating multiple instances of TOption while the value is uninitialized, which is wasteful but seems acceptable given it should only happen briefly after startup or config reload.