runtime: Windows service infrequently results in error 1067 on stop operation

Description

.NET 6 published application sometimes crashes on service stop operation. I created a web application from the template and only added the Microsoft.Extensions.Hosting.WindowsServices (version 6.0.0) package and called UseWindowsService() to make sure the application listens for service controller requests.

Here is the code:

using Microsoft.Extensions.Hosting.WindowsServices;

var options = new WebApplicationOptions
{
    Args = args,
    ContentRootPath = WindowsServiceHelpers.IsWindowsService() ? AppContext.BaseDirectory : default
};

var builder = WebApplication.CreateBuilder(options);

// Add services to the container.
builder.Services.AddRazorPages();
builder.Host.UseWindowsService();

var app = builder.Build();

// Configure the HTTP request pipeline.
if (!app.Environment.IsDevelopment())
{
    app.UseExceptionHandler("/Error");
}

app.UseStaticFiles();

app.UseRouting();

app.UseAuthorization();

app.MapRazorPages();

app.Run();

Reproduction Steps

Repository with code, script and repro steps: https://github.com/zryska/net6-repro

  1. Create a new Web application using a template
  2. Make the changes to add UseWindowsService() call as shown above
  3. Publish the application as SFD or FDD for win-x64 runtime
  4. Create a new service using sc create command
  5. Start the service and verify it’s running
  6. Stop the service
  7. Start the service
  8. Repeat steps 4 and 5 until the issue is reproduced

It can be done through the services UI or using a script. On my system it usually happens 1 time out of 10 graceful shutdowns - but it isn’t deterministic.

When done through services UI the following will be displayed: image

Even log => Windows Logs -> System will contain the following message (when done through script or UI):

The .NET6 test service service terminated unexpectedly.  It has done this 26 time(s).

Expected behavior

Service stops gracefully every time.

Actual behavior

Non-graceful shutdowns are happening.

Regression?

No response

Known Workarounds

Explicitly assigning value to ServiceName fixes the problem.

builder.Host.UseWindowsService(o => o.ServiceName = "AnyServiceName");

Configuration

.NET version: .NET 6.0.100 OS version: Windows 10 pro build 19044.1348 Architecture: x64

Other information

No response

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 36 (22 by maintainers)

Most upvoted comments

@ericstj I tried the fix on behalf of @MD-V and so far everything seems to work properly.

@ericstj would be awesome if this would be backported to .NET 6 and 7. We are currently experiencing this exact problem with one of our applications.

We should prioritize the fix for this and consider backporting to .NET 7.0 and 6.0. The workaround to wait some time before exiting doesn’t sufficiently solve the problem, because it doesn’t hold up the progress of https://github.com/dotnet/runtime/blob/967250cc86b4c613015066b47b882e7424fdb590/src/libraries/Microsoft.Extensions.Hosting.Abstractions/src/HostingAbstractionsHostExtensions.cs#L75 Which will dispose the service and make it impossible for the service state to be updated.

@buyaa-n as we discussed, we can try the fix of signaling an in Run when ServiceBase.Run(…) returns, then waiting on this in StopAsync. This will ensure that the service has safely exited.

I opened https://github.com/dotnet/runtime/issues/62923 to track improving the service name issue.

I was able to find out the root cause here. It’s actually a timing bug. The process is exiting (cleanly) before ServiceBase has a change to tell SCM that the service is stopped. The reason that we “fixed” it by setting the service name is that we made the code in front of the service stop go faster and give it a better chance at winning the race.

Here’s roughly what’s happening. The Run method is waiting on https://github.com/dotnet/runtime/blob/d5f59506354aa3b94574bc79bda21673a1da9774/src/libraries/Microsoft.Extensions.Hosting.Abstractions/src/HostingAbstractionsHostExtensions.cs#L104 Which will be unblocked by https://github.com/dotnet/runtime/blob/2d6cc77b9ea17c18e9bc6d6197d460b50c27e792/src/libraries/Microsoft.Extensions.Hosting.WindowsServices/src/WindowsServiceLifetime.cs#L104

That call will advance the main thread to here: https://github.com/dotnet/runtime/blob/d5f59506354aa3b94574bc79bda21673a1da9774/src/libraries/Microsoft.Extensions.Hosting.Abstractions/src/HostingAbstractionsHostExtensions.cs#L108

Which will then signal Stopped here: https://github.com/dotnet/runtime/blob/12a8819eee9865eb38bca6c05fdece1053102854/src/libraries/Microsoft.Extensions.Hosting/src/Internal/Host.cs#L140

This sets this event: https://github.com/dotnet/runtime/blob/2d6cc77b9ea17c18e9bc6d6197d460b50c27e792/src/libraries/Microsoft.Extensions.Hosting.WindowsServices/src/WindowsServiceLifetime.cs#L62 And runs StopAsync which just queues up a redundant call to ServiceBase.Stop: https://github.com/dotnet/runtime/blob/2d6cc77b9ea17c18e9bc6d6197d460b50c27e792/src/libraries/Microsoft.Extensions.Hosting.WindowsServices/src/WindowsServiceLifetime.cs#L85-L91

But then falls through back to Run / Main, without actually doing anything to wait on the background thread completing. At this point its a race between the background thread trying to complete the DeferredStop method: https://github.com/dotnet/runtime/blob/2d6cc77b9ea17c18e9bc6d6197d460b50c27e792/src/libraries/System.ServiceProcess.ServiceController/src/System/ServiceProcess/ServiceBase.cs#L518-L521

And main falling through and retuning causing the process to exit before background threads complete.

So our fix to set the service name just shortened the time spent in the DeferredStop method and made it more likely to reach the point which it can complete its work and notify SCM.

I was able to prove this theory by artificially increasing the time spent in OnStop. After doing this I had 100% repro of the service terminated unexpectedly error. Then I added a delay in the end of my main routine after Run() to exceed this delay and made the error go away.

I think the fix here is to somehow make it so that ServiceBase will tell SCM that its stopped before Run() is allowed to complete. I haven’t yet determined what that the best way to do that might be. I’m not sure how to guarantee that ServiceBase has completed its DeferredStop method.

Thank you, Eric! I noticed this exception being thrown and handled consistently every time so I didn’t pay too much attention to it but specifying ServiceName does seem to eliminate ungraceful shutdowns.

I attached the ProcDump to the .NET core 3.1 repro application and the ArgumentException is not being thrown on stop operation. In case of .NET 6 - assigning value to the ServiceName seems like a viable way forward for now - I’ll let the script cycle for a while to confirm but so far so good.

The problem appears to be an unhandled exception during the call to Stop. It’s caused by an empty ServiceName property. Here’s the callstack

Exception type:   System.ArgumentException
Message:          Source property was not set before writing to the event log.
InnerException:   <none>
StackTrace (generated):
<none>
StackTraceString: <none>
HResult: 80070057
0:036> !clrstack
OS Thread Id: 0x6508 (36)
        Child SP               IP Call Site
000000DF38FFF228 00007ffa3977466c [HelperMethodFrame: 000000df38fff228] 
000000DF38FFF320 00007ff9ee2b4b1f System.Diagnostics.EventLogInternal.WriteEntry(System.String, System.Diagnostics.EventLogEntryType, Int32, Int16, Byte[]) [/_/src/libraries/System.Diagnostics.EventLog/src/System/Diagnostics/EventLogInternal.cs @ 1301]
000000DF38FFF390 00007ff9ee2afcb7 System.Diagnostics.EventLog.WriteEntry(System.String, System.Diagnostics.EventLogEntryType) [/_/src/libraries/System.Diagnostics.EventLog/src/System/Diagnostics/EventLog.cs @ 987]
000000DF38FFF3D0 00007ff910eb612a System.ServiceProcess.ServiceBase.WriteLogEntry(System.String, System.Diagnostics.EventLogEntryType) [/_/src/libraries/System.ServiceProcess.ServiceController/src/System/ServiceProcess/ServiceBase.cs @ 951]
000000DF38FFF420 00007ff911101ce7 System.ServiceProcess.ServiceBase.DeferredStop() [/_/src/libraries/System.ServiceProcess.ServiceController/src/System/ServiceProcess/ServiceBase.cs @ 519]
000000DF38FFF4A0 00007ff911101c0f System.ServiceProcess.ServiceBase+c__DisplayClass68_0.b__2(System.Object) [/_/src/libraries/System.ServiceProcess.ServiceController/src/System/ServiceProcess/ServiceBase.cs @ 802]
000000DF38FFF4D0 00007ff9546667b9 System.Threading.QueueUserWorkItemCallbackDefaultContext.Execute() [/_/src/libraries/System.Private.CoreLib/src/System/Threading/ThreadPoolWorkQueue.cs @ 960]
000000DF38FFF500 00007ff9546655de System.Threading.ThreadPoolWorkQueue.Dispatch()
000000DF38FFF590 00007ff95466d22a System.Threading.PortableThreadPool+WorkerThread.WorkerThreadStart() [/_/src/libraries/System.Private.CoreLib/src/System/Threading/PortableThreadPool.WorkerThread.cs @ 63]
000000DF38FFF6A0 00007ff954651c6f System.Threading.Thread.StartCallback() [/_/src/coreclr/System.Private.CoreLib/src/System/Threading/Thread.CoreCLR.cs @ 105]
000000DF38FFF930 00007ff9707ba243 [DebuggerU2MCatchHandlerFrame: 000000df38fff930] 

For me I was able to specify the service name as follows and it fixes the issue:

builder.Host.UseWindowsService(e => e.ServiceName = "EricWebSvc");

I’ll see if I can build a theory as to why this wasn’t an issue in 3.1.

Correction: this doesn’t appear to be unhandled, WriteLogEntry swallows it, but for whatever reason avoiding this exception seems to fix the errors for me.

I tried on 3.1 and agree with you that this doesn’t seem to repro at all, so likely this was introduced by some change in 5.0.

I can confirm that running Start-Service Stop-Service loop creates Error entries in system journal

using Microsoft.Extensions.Hosting.WindowsServices;

WebApplication.CreateBuilder(new WebApplicationOptions
    {
        Args = args,
        ContentRootPath = WindowsServiceHelpers.IsWindowsService() ? AppContext.BaseDirectory : default
    })
    .Host.UseWindowsService().Build().Run();