efcore: Delay first retry in Transient Error Handling with Azure SQL
The published advice on transient error handling for Azure SQL recommends
- Delaying “several” seconds before the first retry
- Closing and opening the connection prior to eeach retry
I’ve poked around in SqlServerRetryingExecutionStrategy
and related classes, and can’t find any evidence that either of those two recommendations are followed in EF Core - nor have I been able to figure out how I might implement those recommendations in a custom execution strategy.
Additionally, I’ve found that an Execution Timeout Expired.
exception (Error number -2) is explicitly not considered transient – yet it is the single most frequently occurring exception we encounter in our non-EF database code. The retry strategy we’ve implemented for that non-EF code closes and re-opens the connection before retrying the query and has completely eliminated failures due to timeout exceptions. I’ve had to add error number -2 to the errorNumbersToAdd
list for EF Core, but, because the connection isn’t closed and re-opened, I have zero expectation that retries for those errors will be successful.
Is there a plan to support the recommended transient error handling when targeting Azure SQL? Is there a way I can implement a custom execution strategy that will close and re-open the database connection?
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Reactions: 4
- Comments: 15 (12 by maintainers)
Commits related to this issue
- Increase the retry delay for throttling on Azure SQL. Detect Azure SQL based on connection string. Add more transient errors. Fixes #27826 — committed to dotnet/efcore by AndriySvyryd 10 months ago
- Increase the retry delay for throttling errors on Azure SQL. Detect Azure SQL server based on connection string. Detect more transient errors. Fixes #27826 — committed to dotnet/efcore by AndriySvyryd 10 months ago
- Increase the retry delay for throttling errors on Azure SQL. (#31612) Increase the retry delay for throttling errors on Azure SQL. Detect Azure SQL server based on connection string. Detect more tr... — committed to dotnet/efcore by AndriySvyryd 10 months ago
@MNF this is what we’re using now - I removed all the extra logic to close and reopen the connection since it turned out to be unnecessary. The only things that this strategy does that’s different from the built-in is to use the
DecorrelatedJitterBackoffV2
method (in thePolly
package) and treat SQL Timeout (-2) errors and IOException and SocketException as transients that should be retried.@stevendarby We’ll probably add an Azure-specific execution strategy that the user needs to choose explicitly
@AndriySvyryd Thanks for the reply. Since my original post, I’ve implemented a custom
ExecutionStrategy
that explicitly closes the connection, waits 5 seconds and then re-opens it (via an override ofOnRetry()
) - seems to be working fine but if I don’t need to go to all that trouble, I’m definitely interested in changing my implementation.Can you point me to the code that sets the connection state to Broken when a retriable exception occurs? I dug through the EFCore repo and couldn’t find anything obvious. I’d like to confirm exactly what happens on an exception and modify my code accordingly.
That’s correct, -2 is still not retried by default.
Not currently. It is recommended to use different Execution strategies depending on the target database.
Adding this would be out of scope for this issue. I’ve opened https://github.com/dotnet/efcore/issues/30023 to track it.