Polly: Honouring retries from 429 response
I am attempting to capture a 429 response, and the advised retry interval from the HTTPResponse, as outlined in the blog post here
I am using .Net Core, with version 5.1 (current latest) of Polly.
I have started trying to get a simple HTTPResponse captured, but ultimately I want to use Polly to handle retries for the .Net Core DocumentDB client.
The example code in the blog post is as follows:
var honoursRetryAfter = Policy<HttpResponseMessage>
.Handle<HttpException>
.OrResult(r => statusCodesToRetry.Contains(r.StatusCode)) // detail elided
.RetryAsync(retryCount: /* your preferred retry count */,
sleepDurationProvider: (retryAttempt, context) =>
context.ContainsKey("RetryAfter") && context["RetryAfter"] != null
? (TimeSpan)context["RetryAfter"]
: defaultSleepProvider(retryAttempt) // detail elided
);
Unfortunately, this particular item seems to be more pseudocode than a usable example - there is no such named parameter as sleepDurationProvider
for RetryAsync. So, I have tried WaitAndRetry as follows:
var honoursRetryAfterAsync = Policy<HttpResponseMessage>
.Handle<Exception>()
.OrResult(r => r.StatusCode.ToString().Contains("429"))
.WaitAndRetryAsync(
retryCount: 3,
sleepDurationProvider: (retryAttempt, context) => context.ContainsKey("RetryAfter") && context["RetryAfter"] != null
? (TimeSpan)context["RetryAfter"]
: TimeSpan.FromSeconds(0.5),
onRetryAsync: async (message, timespan, retryAttempt, context) => await DoSomething(context["httpClient"]));
However, it is becoming confusing, as there are so many overrides with different signatures, and their availability depends on the complete signature of the complete Policy, and some of the examples in the documentation do not compile. e.g:
Policy
.HandleResult<HttpResponseMessage>(r => r.StatusCode == 500) // Error - cannot use equality between r.StatusCode and int
.OrResult<HttpResponseMessage>(r => r.StatusCode == 502) // Error - should not specify <HttpResponseMessage> again
So, I am feeling as If I am just hacking things together in an inappropriate way đ
I would really appreciate it if someone can point me in the right direction to get this feature working, as intended.
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Reactions: 1
- Comments: 32 (17 by maintainers)
Hi,
Thanks @reisenberger.
For next one who are looking for a sample code:
Let me know if you see something to improve đ
Thanks @reisenberger for clarity! I had missed that point. Just wanna also say thanks for the quality of your responses throughout this thread, helped me learn better how Polly works.
What I learned from studying this thread, and the call out to Microsoft Support, was that 30 secs of retry is fine for our use case. Any more than that and we need to just kick down and pay for more RUâs đ
@mcquiggd Polly v5.6.0 (uploading to nuget in next day or so) will include the ability for
WaitAndRetry
policies to define asleepDurationProvider
which can take the handled fault (exception; or returned result) as an input parameter to thesleepDurationProvider
delegate.This should make it even easier to use
WaitAndRetry
policies to wait for a âretry afterâ duration specified by the called system; eg as CosmosDB does with HTTP response code 429 and response headers@mcquiggd
B1/B2 Handled results: When a policy is a strongly-typed generic
Policy<TResult>
, executions through the policy can only return results of typeTResult
. Given that, we should be able to passcontext.LastHandledResult
asTResult
.It would mean creating a new generic-typed
Context<TResult> : Context
. We would need to check for breaking changes, but givenContext<TResult> : Context
, my initial (shallow) assessment is that breaking changes would be avoided.(thoughts on other categories coming separately)
@mcquiggd More detailed proposal for handling
InnerException
s ofAggregateException
, now under #256 : comment welcome!@reisenberger
Yep, got it - thats why I wanted to look at the Transient Fault Handler code and Document Client code to see under what circumstances (and why) there are Aggregate exceptions. I put the background for others reading the thread đ
It would be a perfectly reasonable theory that the Async methods on DocumentClient throw AggregateExceptions, and the non-async methods throw DocumentClientExceptions. But as I say, now its time to code / examine code and see how things work đ
@mcquiggd
B1/B2 Handling multiple exception types: Yes, a single policy can handle multiple unrelated exception types (say,
TException
andVException
). And this power has some relatively common and useful Use Cases; eg usingHttpClient
w/ async, itâs common to want to handle bothHttpRequestException
andTaskCanceledException
(for timeouts).That power does make providing a strongly-typed
TException
to egonRetry
not a straight fit.Thinking around ⊠The only option (which would be a major API change) would be moving the
onRetry
-style delegates of Polly into the.Handle<>(...)
clause. But it carries other disadvantages. It weakens the meaning of the delegates (would dissociate them from their current, nicely-statedonBreak
,onRetry
,onFallback
role). And it would introduce code duplication for the relatively common Use Case, that you simply want to log something around the exception - that it eg caused the retry or circuit breakage. Not altogether satisfactory in different waysâŠAt this stage, Iâm left thinking that the need (sometimes) to disambiguate which exception you have received is a necessary corollary of the power to handle multiple exception types, encapsulated nicely in a single Policy. Open to thoughts, though, from anyone, if weâre missing some clever alternative?
(This discussion excludes
AggregateException
- comment following)EDIT: Another option that does exist, with existing Polly, is to specify separate policies for each exception type. You can use the same policy type more than once in a wrap. Where the
onRetry
handling was known to want to differ for different exception types, this provides a mechanism. For example, this pattern is commonly used to set up one retry policy for re-authentication, and another for transient faults.Very nicely, and I was going to suggest that - the existing number of overloads was a little intimidating at first, and of course changing the execution behaviour and breaking backwards compatibility is a non-starter.
@mcquiggd
A5. So, we could do this (add
sleepDurationProvider
overloads taking exception, handled result), but Iâm favouring adding the Exception / handled result toContext
instead to avoid proliferating further overloads (and opens up potential for other Use Cases profiting from the same info).So, youâd have access to eg,
context.LastHandledException
andcontext.LastHandledResult
- would this do the trick?( Great to have this conversation and see the DocumentDb use case in detail: thanks for sharing! )
@mcquiggd Great to see this all coming together!
đ using the
Context
to pass things (operation name; Id) into the call - exactly what intended for.A1. All Polly policies are fully thread-safe, so youâre absolutely fine re-using an instance across multiple calls (if relevant to any decisions around this).
A2. Whether you want to
continueOnCapturedContext
or not, be driven simply by whether better for your Use Case / usage with DocumentDB (usual considerations around how it influences async context marshalling). Polly doesnât care either way.continueOnCapturedContext == true
usually only necessary if you specially need execution after theExecuteAsync(...)
to continue on the same context as before (some actor frameworks; or need preserve UI execution context).A3/A4. Your analysis is exactly correct. Sleep duration is calculated before
onRetry/Async
is invoked - even though only used after - because itâs an input parameter toonRetry/Async
. Iâve updated the Retry wiki to make this more accurate.Foresaw your A3 problem as soon as you posted this, btw, but ⊠busy day this end ⊠you beat me to the follow-up đ and figured it out, so great đ . This exact issue is why my example the day before was more circuitous!
( A5. comments in mo )
@reisenberger
Made some changes this morning, including storing the id of the Document to be inserted via Context when ExecuteAsync is called, and that functions as expected, with data subsequently being available within the Policy.
A1. I changed the Policy declaration to be simply private - i.e. not static, not read only - I observed no change in behaviour.
A2. I have also set the parameter continueOnCapturedContext on ExecuteAsync to true - this does not appear to have any impact on results.
A3. I have verified that on retry attempt 1, there is no RetryAfter being returned in the Context, although it is returned from the server.
A4. On investigation, this is due to the sleepDurationProvider being called before the onRetryAsync delegate, which is not what I expected from the Retry lifecycle :
A5. I donât see an override of the sleepDurationProvider that provides access to the Exception / handled object - which would allow simplified logic in this instance.
Any thoughts�
@reisenberger
Well, I have made a little progress.
Firstly, I have taken the DocumentDB Benchmark project (available from the DocumentDB GitHub repo, under samples), copied it and converted it to .Net Core, so we have both .Net 4.x and .Net Core 1.x versions in the same solution.
I am using this as a testbed for my âexperimentsâ, with the latest DocumentDB emulator set to use a single, non-partitioned collection of 400 Request Unit capacity, the âslowestâ.
I deliberately set the values for threads, and amount of documents to insert, to cause throttling.
With the DocumentDB Client retry, throttling was handled automatically.
Next, I disabled the DocumentDB Client retries; of course exceptions occurred. Bear in mind that these settings are âglobalâ if you follow the sound advice of using a single Client instance, which does a lot of performance enhancement, but is also a little inflexibleâŠ
Then, I created a Polly Retry Policy, as follows:
So, basically using Context to pass the retry value from the onRetryAsync to the sleepDurationProvider, and using purely Polly to handle waits etc. And now we can create different policies / settings for different âoperationsâ, using the same Client instance, potentially passing the âoperation nameâ in Context from the call to Execute the Policy⊠if my understanding of the Retry life-cycle is correct. I have noticed that retry attempt 1 does not seem to include a server retry-after value, but had limited time today to verify.
This Policy is then called in the multi-threaded benchmarking app as below:
The result - Polly handles all the retries, observing the returned retry interval from the server:
Iâll continue to work on this (it can definitely be tidied up, I have a lot of distractions around at the moment), and upload the Solution to Github in case others wish to contribute.
Perhaps you can spot any obvious mistakes - e.g. is it valid to create the Policy as static readonly in a multithreaded environment?
I want to create some more advanced examples for the different scenarios available with Polly, along the lines of your last post⊠will pop onto Slack at some pointâŠ
Thanks for the thoughtful commentary @mcquiggd ^^ (two up). Agree on all counts!
Re your point 3 - wrapping a bunch of policies around a call as a small abstraction - check out Pollyâs
PolicyWrap
(if not already seen it), designed for exactly this purpose. We also provide guidelines on ordering the policies within a wrap, althoughPolicyWrap
is intentionally fully flexible, so any ordering can be used.+1 to your idea of:
We would love to hear more about this later, if grows and you would like to share. We would love to eg blog/guest-blog about it, or share samples perhaps (duly credited) via a Polly.Contrib. (Contact me on Polly slack later if u want to discuss.)
One issue with the in-built retries for the various Azure services is that they are all different APIs - good in the sense each is probably a good fit/strategy for the particular service - but a slightly fragmented API experience. Another issue with combining Polly policies with Azureâs in-built retries, is that with Polly-wraps-Azure-retry you can effectively only have the retry innermost (out of the set of policies). However, you might want to have certain policy types inside a retry, in some
PolicyWrap
s.Re Topaz and the Enterprise Application Blocks, yes, feels like it is well on the way out. The Microsoft patterns and practices team are now moving away from recommending Topaz, in favour of Polly (Iâve provided input on the Polly samples).
24 hours is a long time in technology - now we have Azure Cosmos DB⊠đ„
From the documentation:
Having looked at the DocumentClient source code, I am currently testing an approach that disables the built-in retries, and uses Polly instead. Will report back if I make progress.
@mcquiggd Thank you also for pointing out the slips in the readme doco. Now fixed in #248 .
@mcquiggd. Yes, that was pseudo-code-ish (have re-labelled). Thanks for identifying the issues! Letâs see if we can sort it out. Iâd also been meaning to ask @jmansar (who originally requested the feature, #177) or @tcsatheesh (who was also interested in using it with DocumentDb) if they wanted to expand on my blogged (pseudo-) code: both, feel free to chip in!
More coming shortly.