npgsql: Include Transient Fault Detection (IsTransient Property)

When programming systems with resiliency in mind it is good to have a strategy which allows you to determine which faults are transient you can decide whether or not to retry them.

An option would be to add an IsTransient property onto the PostgresException for those that are defining retry policies, having it implement the logic in the PostgresDatabaseTransientErrorDetectionStrategy method below. Also, I may have misclassified some error codes, so please double check my assumptions as well.

Using Polly, here is an example of a RetryPolicy I defined loosely following the best practices laid out by the SQLTransientErrorDetectionStrategy in Microsoft’s Transient Fault Handling Application Block: https://social.technet.microsoft.com/wiki/contents/articles/18665.cloud-service-fundamentals-data-access-layer-transient-fault-handling.aspx

I don’t know why the first portion of my code sample won’t format. I apologize for that.

public static Policy PostresTransientFaultPolicy
{
	get
	{
		return postgresTransientPolicy ?? (postgresTransientPolicy = Policy.Handle<Exception>( PostgresDatabaseTransientErrorDetectionStrategy())
				   .WaitAndRetryAsync(
						retryCount: 10, 
						sleepDurationProvider: retryAttempt => ExponentialBackoff(retryAttempt, 1.4), 
						onRetry: (exception, span) => Log.Warning(exception, "Postgres Retry Failure: "))
				.WrapAsync(
					   Policy.Handle<Exception>( PostgresDatabaseTransientErrorDetectionStrategy())
						   .AdvancedCircuitBreakerAsync(
							   failureThreshold:.4, 
							   samplingDuration: TimeSpan.FromSeconds(30), 
							   minimumThroughput: 20, 
							   durationOfBreak: TimeSpan.FromSeconds(30), 
							   onBreak: (ex, timeSpan, context) => Log.Warning(ex, "Postres Circuit Breaker Broken: "), 
							   onReset: (context) => Log.Warning("Postres Circuit Breaker Reset: "), 
							   onHalfOpen: () => Log.Warning("Postres Circuit Breaker Half Open: ")
						   )));
	}
}

private static TimeSpan ExponentialBackoff(int retryAttempt, double exponent)
{
	return TimeSpan.FromSeconds(Math.Pow(retryAttempt, exponent));
}

private static Func<Exception, bool> PostgresDatabaseTransientErrorDetectionStrategy()
{
	return (ex) =>
	{                
		//if it is not a postgres exception we must assume it will be transient
		if (ex.GetType() != typeof(PostgresException))
			return true;

		var pgex = ex as PostgresException;
		switch (pgex.SqlState)
		{ //Assumed Transient Errors
			case "53000":   //insufficient_resources
			case "53100":   //disk_full
			case "53200":   //out_of_memory
			case "53300":   //too_many_connections
			case "53400":   //configuration_limit_exceeded
			case "57P03":   //cannot_connect_now
			case "58000":   //system_error
			case "58030":   //io_error

			//These next few I am not sure whether they should be treated as transient or not, but I am guessing so

			case "55P03":   //lock_not_available
			case "55006":   //object_in_use
			case "55000":   //object_not_in_prerequisite_state
			case "08000":   //connection_exception
			case "08003":   //connection_does_not_exist
			case "08006":   //connection_failure
			case "08001":   //sqlclient_unable_to_establish_sqlconnection
			case "08004":   //sqlserver_rejected_establishment_of_sqlconnection
			case "08007":	//transaction_resolution_unknown
				return true;
		}

		return false;
	};
}

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Comments: 26 (12 by maintainers)

Commits related to this issue

Most upvoted comments

Here is the link to the discussion: https://www.postgresql.org/message-id/CAD0baJ6wOgqFqkkmnOChe=S+A6=tUF95c2PpGxWC1W7EdCDH2Q@mail.gmail.com

Based off of their and your suggestions I will put out a pull request in the next couple days with an optimistic “IsTransient” property.