azure-sdk-for-js: How to retry when response code is `409`
Could you please let me know how to retry when an HTTP response code is 409 Conflict
with @azure/core-pipeline-rest
package?
As far as reading the code, the package seems to use 408
, 500
, 502-504
, 506
or more as response codes that are need to retry.
https://github.com/Azure/azure-sdk-for-js/blob/82996230773ab8295f06b17a8b6f449f9d9f2a8c/sdk/core/core-rest-pipeline/src/retryStrategies/exponentialRetryStrategy.ts#L81-L94
Or please add 409
to the retry target status code.
Background:
We are trying to develop a GitHub actions aca-review-apps that operates revision status of Azure Container Apps and uses @azure/core-pipeline-rest
via @azure/arm-appcontainers
.
If the operation by the action and the one from Azure portal are conflicted, then Azure platform returns HTTP 409 error as the action result. In this case we think it’s needed to retry the operation.
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Reactions: 1
- Comments: 23 (22 by maintainers)
Sorry for not being clearer. My advice is to forget about the pipeline and handle retrying the operation on the client yourself, such as:
Yes we support throttling using
Retry-After
with no additional effort from the consumer.@horihiro Back to the issue at hand, is it true that your 409 will resolve itself in a relatively short timeframe? By default, we attempt to retry 3 times and give up after no more than around 10 seconds. While you can configure this to allow for greater delays, this configuration applies to all retry reasons, not just a single HTTP status code.
I think my suggestion here would be to not rely on pipeline magic for automatic retries and simply try/catch the client operation. This way you can detect the 409 and decide to execute the operation again as many times as you want with whatever pauses or delays you feel are necessary.
Changing the sdk to support retry options with status codes should be done.
Fixing the rest API in a non breaking way is likely possible.
A dedicated client that uses the changed sdk is a fine idea.
Changing the pipeline after client construction is strongly discouraged.
– Jeffrey Richter
From: Hirofumi Horikawa @.> Sent: Thursday, September 22, 2022 4:19:38 PM To: Azure/azure-sdk-for-js @.> Cc: Jeffrey Richter @.>; Mention @.> Subject: Re: [Azure/azure-sdk-for-js] How to retry when response code is
409
(Issue #23298)Looks like the above code works on my side, removePolicy also can remove the specified policy from client.pipeline, then client opration is executed under the default pipeline state after the code.
In this issue, there are the following possible ways, but all of them are not easy or recommended, right?
— Reply to this email directly, view it on GitHubhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FAzure%2Fazure-sdk-for-js%2Fissues%2F23298%23issuecomment-1255651337&data=05|01|jeffreyr%40microsoft.com|91b7038b35584cafab7c08da9cf0ebd4|72f988bf86f141af91ab2d7cd011db47|1|0|637994855807371791|Unknown|TWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D|3000|||&sdata=6HKhw7f8FlnVbA09bTT6mK6Qa%2B6mh%2FwPuVoWluWSCpE%3D&reserved=0, or unsubscribehttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAARLJP7FP42RLSPVPVTEATDV7TSQVANCNFSM6AAAAAAQSRN4RA&data=05|01|jeffreyr%40microsoft.com|91b7038b35584cafab7c08da9cf0ebd4|72f988bf86f141af91ab2d7cd011db47|1|0|637994855807371791|Unknown|TWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D|3000|||&sdata=cgs4PpcvUVNPAawEwrRNnCBQkwxF0iKPUQyhMSBG1WY%3D&reserved=0. You are receiving this because you were mentioned.Message ID: @.***>
No, once a client is constructed, its options are immutable. This is by-design so that behavior of a client can’t be changed by arbitrary code - the behavior is fixed over the client’s lifetime. Also, in languages that support threads, this prevents race conditions which are near impossible to debug. You could create a new client before calling each operation but this will create a lot of garbage and cause more GCs.
Some language SDKs allow configuring the retry policy via client options. From the code you show above, it looks like JS does not support this. The JS SDK could add this feature and a customer could be instructed to add support for 409 to solve the problem in this issue. For example, our Go SDK lets the customer configure/override the HTTP status codes to retry on; see https://github.com/Azure/azure-sdk-for-go/blob/main/sdk/azcore/policy/policy.go#L106
The JS SDK should add this feature too (if not already there).
If you add retry on 409 to the pipeline (via client options) then this will apply to ALL operations on the client and this is not ideal. It may OK for this service but we definitely wouldn’t do this for all services. What a customer could do is use one client for all operations except this one and use another client for this 1 operation where the retry policy is configured to retry on 409.
First, data-plane and control-plane do not always follow the same guidelines. Second, there are some HTTP standards that should be adhered to and some parts of the standard can be ambiguous.
A 409-Conflict indicates that the current state of the resource and the operation being attempted are in conflict with each other. There is no reason to think that the resource will dynamically change to resolve the conflict. Instead, it is more likely that the client request has to change to resolve the conflict and this is why retrying shouldn’t make sense for a 409 and also why the service should not return a 409 in this case.
So this is what is wrong. Now, we can discuss potential ways to fix this. The simple “solution” is to allow retries for this specific operation should the service return a 409. This also assumes, that the 409 is always being returned for this specific reason and not ever returned for any other reason. I don’t know if this assumption can be made for this specific service operation.
Our Azure data-plane guidelines, were designed taking this race condition into account and so if the service were to implement this feature using Azure data-plane guidance, this problem is averted. Perhaps this operation can change its implementation to follow the data-plane guidance but the control-plane team would have to get involved in this and changing the behavior is a breaking change and Azure’s policy very strongly discourages breaking changes.
The data-plane guidance disallows LRO PATCH operations for many reasons and you’re experiencing one of them now. The data-plane guidance to enable this feature is to use an LRO POST-action pattern. I do think the control-plane should adopt this guidance too.
If the PATCH operation is implemented as a long-running operation (which it sounds like it is), then retrying would potentially take a long time (by definition). For Azure data-plane operations, we forbid implementing PATCH as a long-running operation and instead require it be implemented as a POST-action LRO and if multiple clients issue this simultaneously, then the operations are queued and executed sequentially. So, no 409 would be returned.
I think the container apps should consider changing their pattern to this as the data-plane pattern is better. A 409 should not be considered retryable. Depending on the language, you might be able to get the SDK to retyr on a 409 but you wouldn’t want this behavior for any other operation on the client object.