runtime: Generic Host is not stopped when BackgroundService crashes

Updated by @maryamariyan:

Description:

We want to add a new opt-in option on HostOptions that reacts to failures on a background service.

It generally might not be always desirable to allow for a failure/crash on a background service to stop the application altogether and therefore logging the failure is a default behavior.

API Proposal:

API below would add the option to stop the host on background service exception.

namespace Microsoft.Extensions.Hosting
{
    public class HostOptions
    {
        public bool StopOnBackgroundServiceException { get; set; }
    }
}
Original Description (click to view)

Description

According to the documentation of the BackgroundService.ExecuteAsync method that method should return a Task that represents the lifetime of the long running operation(s) being performed.

For me that implies that this Task is monitored by the generic host, so I would expect that when the BackgroundService.ExecuteAsync’s Task instance is completed (successfully or due to an exception) the host itself would also stop. This is not the case. As a result, a Windows service where the BackgroundService.ExecuteAsync method has crashed or stopped will continue to appear running in the Services Management Console, but will no longer be performing any work., causing all monitoring systems to keep reporting that everything is ok.

Configuration

  • .NET Core 3.1.402
  • OS Windows 10.0.18363
  • Architecture X64

Sample application

The following sample demonstrates the problem. You can either run it on the console or install it as windows service and start it. The program will continue to run after the background service has crashed.

using System;
using System.Threading;
using System.Threading.Tasks;
using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Hosting;
using Microsoft.Extensions.Logging;

namespace WorkerTest
{
    public static class Program
    {
        public static void Main(string[] args)
        {
            CreateHostBuilder(args).Build().Run();
        }

        private static IHostBuilder CreateHostBuilder(string[] args) =>
            Host.CreateDefaultBuilder(args)
                .UseWindowsService()
                .ConfigureServices((hostContext, services) =>
                {
                    services.AddHostedService<Worker>();
                });
    }

    public class Worker : BackgroundService
    {
        private readonly ILogger<Worker> _logger;
        private int _counter;

        public Worker(ILogger<Worker> logger)
        {
            _logger = logger;
        }

        protected override async Task ExecuteAsync(CancellationToken stoppingToken)
        {
            try
            {
                while (!stoppingToken.IsCancellationRequested)
                {
                    _logger.LogInformation("Worker running at: {time} ({counter})", DateTimeOffset.Now, _counter++);
                    if (_counter > 10)
                        throw new Exception("Something happened!");
                    await Task.Delay(1000, stoppingToken).ConfigureAwait(false);
                }
            }
            finally
            {
                if (!stoppingToken.IsCancellationRequested)
                    _logger.LogError("Worker stopped unexpectedly");
            }
        }
    }

}

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 2
  • Comments: 42 (27 by maintainers)

Most upvoted comments

Video

  • It seems the new behavior is more desirable than the current behavior but it’s hard to know how many apps would be broken by the new behavior.
  • The API isn’t very discoverable, so if it’s purely opt-in, then most customers aren’t likely to benefit from the new behavior.
  • A compromise would be:
    • Change the templates for .NET 6 to set this property to BackgroundServiceExceptionBehavior.StopHost
    • Based on customer feedback, we can change the default for .NET 7 and simplify the templates
  • On the other hand, this is early for .NET 6
    • Let’s ship the API
    • Let’s change the default now, not update the templates, see what feedback we get for the previews, and revert back to opt-in if it breaks many people.
namespace Microsoft.Extensions.Hosting
{
    public enum BackgroundServiceExceptionBehavior
    {
        Ignore,
        StopHost
    }
    public class HostOptions
    {
        public BackgroundServiceExceptionBehavior BackgroundServiceExceptionBehavior { get; set; }
    }
}

Lets do this for 6.0. Add the option to stop the host on background service exception.

namespace Microsoft.Extensions.Hosting
{
    public class HostOptions
    {
        public bool StopOnBackgroundServiceException { get; set; }
    }
}

cc @maryamariyan @eerhardt @ericstj

We are using the following base class for background services that should terminate the application on error:

public abstract class TerminatingBackgroundService : BackgroundService
{
    protected IServiceProvider ApplicationServices { get; }
    protected IHostApplicationLifetime ApplicationLifetime { get; }
    protected ILogger Logger { get; }

    protected TerminatingBackgroundService(IServiceProvider applicationServices)
    {
        ApplicationServices = applicationServices;
        ApplicationLifetime = applicationServices.GetRequiredService<IHostApplicationLifetime>();
        Logger = ApplicationServices.GetRequiredService<ILoggerFactory>().CreateLogger(GetType());
    }

    protected override async Task ExecuteAsync(CancellationToken stoppingToken)
    {
        try
        {
            await ExecuteCoreAsync(stoppingToken);

            Logger.LogInformation("Execute has finished. IsCancellationRequested: {IsCancellationRequested}", stoppingToken.I
        }
        catch (OperationCanceledException) when (stoppingToken.IsCancellationRequested)
        {
            // We're shutting down already, so we don't have to do anything else.
            Logger.LogInformation("OperationCanceledException on shutdown");
        }
        catch (Exception exception)
        {
            Logger.LogCritical(exception, exception.Message);

            // We need StopApplication() so that it properly flushes and disposes the logging system.
            // However, it is a graceful shutdown that exits with code 0.
            Environment.ExitCode = 1;

            if (!stoppingToken.IsCancellationRequested)
            {
                ApplicationLifetime.StopApplication();
            }
        }
    }

    protected abstract Task ExecuteCoreAsync(CancellationToken stoppingToken);
}

I like it

How can it be an “exepcted outcome” that the host process continues to run when the background service doing the actual work has crashed without the host process even knowing? This is a very dangerous situation where all monitoring systems that monitor running Windows services report that everything is OK while no work is being performed.

As the host is unaware that the worker service has stopped it also cannot take any corrective actions like restarting the worker and/or reporting the error…

@IEvangelist - I assigned it to you. Thanks for taking this. I’m looking forward to seeing it being completed.

I see this is up-for-grabs - I’m interested in working on this one, are you okay with me assigning it to myself @maryamariyan and @eerhardt?

The current plan is to do that yes, which will in most cases unblock main and will quit the program.

I just experienced this problem and I vote for changing the default to stop the application on unhandled exceptions. As a .NET developer I don’t expect exceptions to be swallowed by the framework. I can always do that myself if that is the desired behavior.

Would it be an alternative that instead of the HostOptions.BackgroundServiceExceptionBehavior property, which I assume would affect all background services in an application, add a BackgroundService.ExceptionBehavior property, so that the behavior can be controlled per background service as requested by @marcselis above?

Since either solution won’t change the different behavior of unhandled exceptions in ExecuteAsync thrown before the first await (which crashes the application with an unhandled exception) and during or after the first await (which will now stop the application gracefully), could that at least be documented for the ExecuteAsync method?