runtime: BackgroundService handles Exceptions getting thrown differently when tasks are not faulted vs faulted
I think how exceptions are handled with BackgroundService
class is a bit confusing and may require some discussion.
- If an exception is thrown during startup (e.g. in Main) and before
ExecuteAsync
is handled, the app will crash. - If an exception is thrown during
ExecuteAsync
but before the firstawait
is called, the app will crash. - If an exception is thrown during
ExecuteAsync
but during or after the firstawait
is called, the app will … semi-hang? It stop processing … but … Ctrl-C does successfully start the app gracefully terminating.
Now - I hope i have the above steps/scenario’s right – I might have fudged something (async/await/ task contexts, etc are my weak points).
So the problem here is about the inconsistency about what an exception should be doing in a BackgroundService
. Should it crash the host (assuming no top level try/catch in Main)? or get swallowed? Maybe the issue is really about documentation (as the code is totally fine).
This issue arose when I started a conversation in the “ASP.NET slack” group when I was trying to handle an exception getting thrown in side the ExecuteAsync
method and bubbling that upwards.
To quote some of the summaries of this convo:
@davidfowl [7:06 PM] great discussion heh it could suppress both and if you wanted to get exceptions, then you override StartAsync and put logic there
–
@chrisvanderpennen [7:13 PM] theres imo three scenarios that need covering: i don’t want any exceptions in this to stop my host i want an exception during init to stop the host startup, but runtime exceptions are ok i want a runtime exception to stop my host
@davidfowl [7:30 PM] somebody file a bug
so CC’ing @poke @gulbanana @chrisvanderpennen and @davidfowl who were in the convo.
- I think the issue subject might need to be refactored.
- happy to edit the main body of this issue, where i got my facts wrong.
- This issue needs the famous tag ‘fowler’ (fowler has issues). 😃
edit: this was a sample gist I tried to make to also test/repo this problem.
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Reactions: 18
- Comments: 28 (12 by maintainers)
Commits related to this issue
- Merge pull request #813 from benaadams/threading-issue Move AddProviderRegistration under lock — committed to dotnet-maestro-bot/Common by pakrym 6 years ago
Let me try to paint the worst case scenario for each “mode”. Scenario 1. You have a non-important Service and the default is to crash on unhandled errors This brings your application down. You go look at the logs, and you see that it’s your
NotImportantService
that brought the house down. You curse under your breath, and wrap it in a dummy catch or a Polly retry or whatever. Worst case is you get some downtime once.Scenario 2. You have an important service, and the default is to just let it die Your service runs some stuff every hour. At some point data stops coming through. You look at the server - it’s running fine (and you don’t know that a crashing BackgroundService doesn’t crash the server). You look at the logs. Finding anything is hard (impossible without this new PR), because the service could have crashed a long time ago. You mutter under your breath, and figure out how to properly close the application via IApplicationLifetime or similar. I’m not exaggerating when I’ve seen people spend weeks re-writing stuff because they were unable to debug a silently crashing BackgroundService, and thought the process was gettign OOMKilled or similar.
All I’m saying is that the failure mode for Scenario 2 is much worse, which is why I think that crashing should be the default. Admittedly that calculation was more heavily skewed back when we didn’t even have any logs.
But perhaps the right solution is for there to be no default - but forcing the user to explicitly specify “Should this backgroundservice crash the server on an uncaught error” - because I agree that there are both types of services.
Correct. Unfortunately there was higher priority work that we had to bump this out for.
Yes, I have a branch, I’ve just been lazy finishing it. I’ll do it next week.
I couldn’t figure out the best area label to add to this issue. Please help me learn by adding exactly one area label.
I just want to do a shameless plug that I’ve implemented a tiny library BetterHostedServices that lets you inherit from
CriticalBackgroundService
which will end the application on an uncaught exception. Hopefully this issue will get fixed in the core library but until then this is a band-aid solution.anurse was very adamant that a BackgroundService throwing an exception shouldn’t bring down the application. See the thread here:
https://github.com/dotnet/runtime/issues/36017#issuecomment-562710058
In my opinion, this issue is to make these two scenarios consistent:
Since the 2nd case shouldn’t bring down the application, neither should the first case.
A few targeted changes to BackgroundService.
I have just been hit by this along with several colleagues. We have a long running background service pulling from a message queue (Google PubSub). The message queue service is started (await StartAsync) and completes only when manually stopped, or when an unrecovereable error occurs. This recently started periodically failing silently for long periods of time until a restart. And it took us a long time to figure out was a permission issue, even though an exception was thrown by StartAsync.
Our assumptions mirrors @GeeWee’s thoughts, that an unhandled exception is exceptional, and will result in the application exiting. This has been my logic for many years, stemming from multiple sources (some listed below). If you need something to be resilient against exceptions, then it is an active choice you make, and wrap it in a try…catch. Having code swallow exceptions that are too vague is something that I point out again and again in code reviews, since there is a risk of hiding unknown future errors, and since I expect the runtime to catch any unhandled exceptions.
BackgroundService seems to flip this upside down, and now we have to actively handle unknown exceptions in every BackgroundService we implement.
Right, I think it’s fine to suggest an option on HostOptions that would fail if any background service task throws but it wouldn’t be the default.
PR https://github.com/dotnet/runtime/pull/42981, submitted for 6.0, changes this scenario a bit.
This behavior has not changed.
This behavior has not changed.
But for the below scenario:
(directly related to: https://github.com/dotnet/runtime/issues/36017) when this happens we log the exception
summary
The main discussion around this issue is around the fact that if we throw an exception before await vs after an await the app behaves differently.
I would have expected worker services to have a built-in notion of global exception handling, but I guess not (yet)?
Anyway, here’s what I do for now: