azure-sdk-for-net: [QUERY] How to handle etag mismatch (412 Precondition Failed) caused by client retry

Library name and version

Azure.Data.Tables 12.8.1

Query/Question

Consider this scenario:

  1. Azure.Data.Tables client updates an entity with an etag (If-Match: <etag>)
  2. The server times out or returns a retriable status code after the data operation is complete on the server side.
  3. Client retries the operation with the same etag, not knowing the server state has changed
  4. Client encounters HTTP 412 Precondition failed

(there are adjacent scenarios like adding an entity for the first time and retrying to get a conflict)

What is the client to do to recover?

There are certainly solutions existing today, for example the client can retry the whole unit of work bounding the table storage operation (perhaps no-oping if the Table state actually went through or using a totally different entity depending on the implementation). However, sometimes the cost to retry the entire operation in the application is high enough that it’s worth taking on more complexity to handle the 412 gracefully (maybe there is bunch of other API calls prior to the Azure Table API call and you don’t want to repeat those in cases like this).

Simply detecting a retry in the Azure SDK HTTP pipeline is also not sufficient. Some retries will succeed (for example the request fails prior to the data operation persisting on the server side) and others will never succeed (for example another thread updates the entity, causing a legitimate Precondition Failed error).

One idea I wanted to try was performing a client-side diff. So if you hit an HTTP 412 (and optional only if and only if the client performed a retry as an optimization) you can fetch the entity from Table storage and compare the entity with what you were expecting to update. This could be done in a partial manner if you were hoping for a PATCH-like operation or for all properties if you were hoping to replace the entire entity state.

A Table entity is essentially a property bag of well-defined types so performing a client-side diff is pretty straight-forward. Unfortunately, the intermediate model (Dictionary<string, object> from ToOdataAnnotatedDictionary) that would be ideal for generic comparison is internal.

I can perhaps call the method with reflection or write my own serialization model used just for this comparison. But I wanted to raise this problem space to the Azure SDK team and see if there any prior art in this area or recommendations.

Here is a sample app that shows how client-driven retries can cause etag mismatches:

Sample C# console app
using System.Net;
using Azure;
using Azure.Core.Pipeline;
using Azure.Data.Tables;

var messageHandler = new TestHandler { InnerHandler = new SocketsHttpHandler() };
var serviceClient = new TableServiceClient(
    "UseDevelopmentStorage=true",
    new TableClientOptions { Transport = new HttpClientTransport(messageHandler) });
var table = serviceClient.GetTableClient("oopsie");

// setup
Console.WriteLine("Table go!");
await table.CreateIfNotExistsAsync();

Console.WriteLine("Entity go!");
var entity = new TableEntity("pk", "rk");
entity["State"] = "Active";
entity["HitCount"] = 1;
var response = await table.UpsertEntityAsync(entity, TableUpdateMode.Replace);
entity.ETag = response.Headers.ETag!.Value;

// repro
Console.WriteLine("Update with If-Match go!");
entity["HitCount"] = 2;
try
{
    await table.UpdateEntityAsync(entity, entity.ETag, TableUpdateMode.Replace);
}
catch (RequestFailedException ex) when (ex.Status == 412)
{
    Console.WriteLine($"Oopsie. {ex.Status} {ex.ErrorCode}");

    // Get entity with client-side diff?
}

class TestHandler : DelegatingHandler
{
    private int _requestCount;

    protected override async Task<HttpResponseMessage> SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
    {
        var response = await base.SendAsync(request, cancellationToken);
        if (request.Method == HttpMethod.Put && request.RequestUri!.AbsolutePath.EndsWith("(PartitionKey='pk',RowKey='rk')"))
        {
            if (Interlocked.Increment(ref _requestCount) == 2)
            {
                var statusCode = HttpStatusCode.ServiceUnavailable;
                Console.WriteLine($"Fake error! ({statusCode})");
                return new HttpResponseMessage(statusCode) { RequestMessage = request };
            }
        }

        return response;
    }
}

The output looks like this:

Table go!
Entity go!
Update with If-Match go!
Fake error! (ServiceUnavailable)
Oopsie. 412 UpdateConditionNotSatisfied

Environment

.NET SDK:
 Version:   8.0.100-rc.1.23455.8
 Commit:    e14caf947f

Runtime Environment:
 OS Name:     Windows
 OS Version:  10.0.22621
 OS Platform: Windows
 RID:         win-x64
 Base Path:   C:\Program Files\dotnet\sdk\8.0.100-rc.1.23455.8\

.NET workloads installed:
There are no installed workloads to display.

Host:
  Version:      8.0.0-rc.1.23419.4
  Architecture: x64
  Commit:       92959931a3
  RID:          win-x64

.NET SDKs installed:
  3.1.426 [C:\Program Files\dotnet\sdk]
  5.0.408 [C:\Program Files\dotnet\sdk]
  6.0.317 [C:\Program Files\dotnet\sdk]
  7.0.203 [C:\Program Files\dotnet\sdk]
  7.0.308 [C:\Program Files\dotnet\sdk]
  7.0.401 [C:\Program Files\dotnet\sdk]
  8.0.100-rc.1.23455.8 [C:\Program Files\dotnet\sdk]

.NET runtimes installed:
  Microsoft.AspNetCore.App 3.1.32 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App]
  Microsoft.AspNetCore.App 5.0.17 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App]
  Microsoft.AspNetCore.App 6.0.22 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App]
  Microsoft.AspNetCore.App 7.0.5 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App]
  Microsoft.AspNetCore.App 7.0.11 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App]
  Microsoft.AspNetCore.App 8.0.0-rc.1.23421.29 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App]
  Microsoft.NETCore.App 3.1.32 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App]
  Microsoft.NETCore.App 5.0.17 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App]
  Microsoft.NETCore.App 6.0.22 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App]
  Microsoft.NETCore.App 7.0.5 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App]
  Microsoft.NETCore.App 7.0.11 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App]
  Microsoft.NETCore.App 8.0.0-rc.1.23419.4 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App]
  Microsoft.WindowsDesktop.App 3.1.32 [C:\Program Files\dotnet\shared\Microsoft.WindowsDesktop.App]
  Microsoft.WindowsDesktop.App 5.0.17 [C:\Program Files\dotnet\shared\Microsoft.WindowsDesktop.App]
  Microsoft.WindowsDesktop.App 6.0.22 [C:\Program Files\dotnet\shared\Microsoft.WindowsDesktop.App]
  Microsoft.WindowsDesktop.App 7.0.5 [C:\Program Files\dotnet\shared\Microsoft.WindowsDesktop.App]
  Microsoft.WindowsDesktop.App 7.0.11 [C:\Program Files\dotnet\shared\Microsoft.WindowsDesktop.App]
  Microsoft.WindowsDesktop.App 8.0.0-rc.1.23420.5 [C:\Program Files\dotnet\shared\Microsoft.WindowsDesktop.App]

Other architectures found:
  x86   [C:\Program Files (x86)\dotnet]
    registered at [HKLM\SOFTWARE\dotnet\Setup\InstalledVersions\x86\InstallLocation]

Environment variables:
  Not set

global.json file:
  Not found

Learn more:
  https://aka.ms/dotnet/info

Download .NET:
  https://aka.ms/dotnet/download

About this issue

  • Original URL
  • State: closed
  • Created 9 months ago
  • Comments: 15 (11 by maintainers)

Most upvoted comments

This is great. HttpPipeline.CreateHttpMessagePropertiesScope is even better (never knew of this before). I can plumb in everything I need.

Leaving this POC for posterity: https://gist.github.com/joelverhagen/bbf0bdd91cfcdb5784abf135a859a108

The problem gets nasty with $batch so I think I will use a ChangeId = <guid> column which can be used to check on a single property (of a single entity, in the case of $batch). That’s probably what I’ll do in my application. That concept translates better to blobs as well, where you won’t want to do a full byte-for-byte comparison of the blob contents (where it’s more feasible for column-for-column comparison of a Table entity). I can compare a x-ms-meta-changeid: <guid> blob metadata there.

It would still be nice to have ToOdataAnnotatedDictionary as a public extension method for more flexibility in implementation, e.g. it’s very painful to get the property bag from a $batch request body, but I see there are other ways to avoid that blocker.

Thanks for your help. I’ll close this since it appears quite feasible with HttpPipeline.CreateHttpMessagePropertiesScope + a custom RetryPolicy.

I’m not sure why I didn’t think of that! Thanks! And yes, that seems works also. Here’s my attempt: https://gist.github.com/joelverhagen/bbf0bdd91cfcdb5784abf135a859a108/revisions#diff-0b69b473fe937040615d69f606751f61ddbc2e3a1849360ff2456c22afe88c0b

Challenges of the design:

  1. You lose call site context so you have to sniff it from the HttpMessage.Request.
    • You have to extract table name, PK, and RK from the HttpMessage.Request to fetch the remote state. It’s kind of gnarly.
    • You’d need to sniff out whether 412 or 409 is the conflict case (i.e. was it a add entity or an upsert/upsert entity). This is probably based on the HttpMessage.Request If-* headers… complex
  2. You need to override the response to be the expected status code. This is a bit painful since there don’t appear to be any in-memory Response implementations as public types. But it’s a simple class to implement especially for 204 No Content.
  3. You need another nested service client to fetch the remote state. That’s a big painful too.

So the retry occurring inside or outside of the pipeline seems to be a trade-off between which contextual information you want available.

I think this is practical but nasty/complex enough that it should only be done if it’s really important to no-op on these failures in a best effort – which I guess is a decision based solely the dev’s requirements. So of course, I want to implement it 😈

Unfortunately, I don’t see an easy way to get access to the dictionary.

Here is an example of accessing the raw dictionary response:

var response = client.GetEntity<UserEntity>(userEntity.PartitionKey, userEntity.RowKey, null, default);
var dict = response.GetRawResponse().Content.ToObjectFromJson<Dictionary<string, object>>();
foreach (var kvp in dict)
{
	Console.WriteLine($"{kvp.Key}: {kvp.Value}");
/*
prints:
odata.metadata: https://chrissscratch.table.core.windows.net/$metadata#patchicm/@Element
odata.etag: W/"datetime'2023-10-13T15%3A02%3A07.4628778Z'"
PartitionKey: 23ee47a8-5120-49fb-9064-4e12c60fa939
RowKey: 40419cd3-e3c0-4c32-992f-8355632999ab
Timestamp: 2023-10-13T15:02:07.4628778Z
Value: 476692844
*/
}

Thanks @joelverhagen for the additional context!

Consider a case where the diff is done the domain model and then later a new property is added. The wire format (and Azure SDK) will pick up the change automatically via reflection but the hand-rolled diff in the catch would need to react. This could be a source of bugs. So, I would rather perform a diff on the wire format. Granted, a change in domain model might need to change the diff logic in a domain-specific way.

I think there are bug trade-offs either way. But if your comparison was targeted for the scenario rather than a complete entity comparison, a model comparison seems safe here.

Something else to consider is that the wire format is actually not internal, it’s just a Dictionary<string, object>. It’s the transformation to/from the wire format that is internal. If your intent is just to compare (and possibly edit) discrete properties, the dictionary format should be all that you’d need here.

But all that said, the only place you’d be able to evaluate and manipulate the request and response would be in a pipeline policy. I think a policy-based solution would be much more complex for limited additional benefit. However, if you decide this is the path you think makes the most sense for your scenario, I’d be happy to look at any implementation ideas you come up with. The easiest path would probably be to supply a custom RetryPolicy and put the logic there.