azure-cosmos-dotnet-v3: Cosmos sdk throws 408 cosmosException, but operation succeded in the database

Description Using the v4 preview.

When requests take a long time to return a response, the sdk throws an exception with code 408 - Request Timeout. However, if you’d then go looking in the db, the operation actually happened.

Currently our case is we insert a new item in the db. Sometimes we get the 408 exception, but if we retry we get a 409 - Conflict response from the database (saying an entity with this id already exists). This because the operation actually succeeded on the database, the sdk just threw a timeout.

So if you set a break point when such an exception is thrown. And then when it is hit, you go look in the database, you will see the document exists (and thus was created by the request that threw the 408 exception)

To Reproduce We have the following code:

var policyResult = await _retryOnHeavyLoadPolicy.ExecuteAsync(async () =>
{
    await store.CreateItemAsync(entity, new PartitionKey(entity.Discriminator), null, cancellationToken);

    return Task.CompletedTask;
}).ConfigureAwait(false);

Where the heavy load retry policy is a Polly.Net policy defined as:

Policy
    .Handle<Exception>(x =>
        x is CosmosException
        && (((CosmosException)x).Status == (int)HttpStatusCode.TooManyRequests
            || ((CosmosException)x).Status == (int)HttpStatusCode.RequestTimeout))
    .WaitAndRetryForeverAsync(retryAttempt => TimeSpan.FromSeconds(Math.Pow(retryAttempt, 0.5)));

This policy will then retry the create statement if we gat 429 - TooManyRequest or a 408 - Timeout exception.

To reproduce you will need a pretty heavy load and a bad / slow internet connection though.

Environment summary SDK Version: v4.0.0-preview3 OS Version: Windows 10

Question Due to our retry policy, when creating an item we sometimes get a 408 -> retry -> 409. How should we handle the 408 exceptions that we are sure the item is really created in the database (or updated, deleted, … any action)? Without retrying, because that causes a 409.

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 16 (9 by maintainers)

Most upvoted comments

The main item is created and never modified across the system. The partitionKey is set only once. Notice that this behavior is not always apparent, it happens like 70% of the time. Even though everything happens in parallel, i verified that each path of this particular issue is thread safe. I didn’t try printing as the output would be a mess since this specific ExtensionMethod is called so many times. So if you have ideas, i’m all ears.

Try assigning local variables instead of using item.Id and item.PartitionKey directly, the same with the end result comparison. Your code is already doing a `Console.Write, that is why I thought you could add to the same write, those values.

Basically:

string partitionKey = item.PartitionKey;
string id = item.Id;
using (ResponseMessage responseMessage = await container.ReadItemStreamAsync(id, new PartitionKey(partitionKey)))
{
    // Deserialize and compare the response's id and PK
}

Most probably it is on the service side. You guys can reach to the correct team, I’m unable to.

If this is the case, a support ticket needs to be raised.