npgsql: Include Transient Fault Detection (IsTransient Property)
When programming systems with resiliency in mind it is good to have a strategy which allows you to determine which faults are transient you can decide whether or not to retry them.
An option would be to add an IsTransient property onto the PostgresException for those that are defining retry policies, having it implement the logic in the PostgresDatabaseTransientErrorDetectionStrategy method below. Also, I may have misclassified some error codes, so please double check my assumptions as well.
Using Polly, here is an example of a RetryPolicy I defined loosely following the best practices laid out by the SQLTransientErrorDetectionStrategy in Microsoft’s Transient Fault Handling Application Block: https://social.technet.microsoft.com/wiki/contents/articles/18665.cloud-service-fundamentals-data-access-layer-transient-fault-handling.aspx
I don’t know why the first portion of my code sample won’t format. I apologize for that.
public static Policy PostresTransientFaultPolicy
{
get
{
return postgresTransientPolicy ?? (postgresTransientPolicy = Policy.Handle<Exception>( PostgresDatabaseTransientErrorDetectionStrategy())
.WaitAndRetryAsync(
retryCount: 10,
sleepDurationProvider: retryAttempt => ExponentialBackoff(retryAttempt, 1.4),
onRetry: (exception, span) => Log.Warning(exception, "Postgres Retry Failure: "))
.WrapAsync(
Policy.Handle<Exception>( PostgresDatabaseTransientErrorDetectionStrategy())
.AdvancedCircuitBreakerAsync(
failureThreshold:.4,
samplingDuration: TimeSpan.FromSeconds(30),
minimumThroughput: 20,
durationOfBreak: TimeSpan.FromSeconds(30),
onBreak: (ex, timeSpan, context) => Log.Warning(ex, "Postres Circuit Breaker Broken: "),
onReset: (context) => Log.Warning("Postres Circuit Breaker Reset: "),
onHalfOpen: () => Log.Warning("Postres Circuit Breaker Half Open: ")
)));
}
}
private static TimeSpan ExponentialBackoff(int retryAttempt, double exponent)
{
return TimeSpan.FromSeconds(Math.Pow(retryAttempt, exponent));
}
private static Func<Exception, bool> PostgresDatabaseTransientErrorDetectionStrategy()
{
return (ex) =>
{
//if it is not a postgres exception we must assume it will be transient
if (ex.GetType() != typeof(PostgresException))
return true;
var pgex = ex as PostgresException;
switch (pgex.SqlState)
{ //Assumed Transient Errors
case "53000": //insufficient_resources
case "53100": //disk_full
case "53200": //out_of_memory
case "53300": //too_many_connections
case "53400": //configuration_limit_exceeded
case "57P03": //cannot_connect_now
case "58000": //system_error
case "58030": //io_error
//These next few I am not sure whether they should be treated as transient or not, but I am guessing so
case "55P03": //lock_not_available
case "55006": //object_in_use
case "55000": //object_not_in_prerequisite_state
case "08000": //connection_exception
case "08003": //connection_does_not_exist
case "08006": //connection_failure
case "08001": //sqlclient_unable_to_establish_sqlconnection
case "08004": //sqlserver_rejected_establishment_of_sqlconnection
case "08007": //transaction_resolution_unknown
return true;
}
return false;
};
}
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Comments: 26 (12 by maintainers)
Here is the link to the discussion: https://www.postgresql.org/message-id/CAD0baJ6wOgqFqkkmnOChe=S+A6=tUF95c2PpGxWC1W7EdCDH2Q@mail.gmail.com
Based off of their and your suggestions I will put out a pull request in the next couple days with an optimistic “IsTransient” property.