azure-cosmos-dotnet-v3: Conditional read in transactional batch is broken

Describe the bug Transactional batch is committed even though it contains a conditional read operation that is not fulfilled.

To Reproduce I’ve created the following MS tests to re-produce the bug

[TestMethod]
public async Task Transaction_should_be_rejected_when_if_none_match_read_condition_is_not_fulfilled()
{
    // Setup random partition key
    var pk = Guid.NewGuid().ToString();

    // First create an item 'A' so that it exists before batch execution
    await Test.Container.CreateItemAsync(new MyItem { Id = "A", Pk = pk });

    // Now run a transactional batch that:
    // 1) Makes a conditional read on item 'A' to ensure that it doesn't exist (should fail)
    // 2) Creates item 'B'
    var batch = Test.Container.CreateTransactionalBatch(new PartitionKey(pk));
    batch.ReadItem("A", new TransactionalBatchItemRequestOptions { IfNoneMatchEtag = "*" });
    batch.CreateItem(new MyItem { Pk = pk, Id = "B" });
    await batch.ExecuteAsync();

    // Item 'B' should not exist now since the batch should have failed (pre-condition 
    // was not fulfilled)
    var response = await Test.Container.ReadItemStreamAsync("B", new PartitionKey(pk));
    Assert.AreEqual(HttpStatusCode.NotFound, response.StatusCode);
}

[TestMethod]
public async Task Transaction_should_be_rejected_when_if_match_read_condition_is_not_fulfilled()
{
    // Setup random partition key
    var pk = Guid.NewGuid().ToString();

    // First create an item 'A' and capture its etag
    var oldEtag = (await Test.Container.CreateItemAsync(new MyItem { Id = "A", Pk = pk })).ETag;

    // Replace the item so that it gets a new etag
    var newEtag = (await Test.Container.ReplaceItemAsync(new MyItem { Id = "A", Pk = pk }, "A")).ETag;

    // Make sure the new etag is different
    Assert.AreNotEqual(oldEtag, newEtag);

    // Now run a transactional batch that:
    // 1) Makes a conditional read on item 'A' to ensure that it has the old etag (should fail)
    // 2) Creates item 'B'
    var batch = Test.Container.CreateTransactionalBatch(new PartitionKey(pk));
    batch.ReadItem("A", new TransactionalBatchItemRequestOptions { IfMatchEtag = oldEtag });
    batch.CreateItem(new MyItem { Pk = pk, Id = "B" });
    await batch.ExecuteAsync();

    // Item 'B' should not exist now since the batch should have failed (pre-condition was not fulfilled)
    var response = await Test.Container.ReadItemStreamAsync("B", new PartitionKey(pk));
    Assert.AreEqual(HttpStatusCode.NotFound, response.StatusCode);
}

Expected behavior The conditional read operation should have failed and the transactional batch should therefore not be committed.

Actual behavior The conditional read operation is successful (status code 200) and the transactional batch is committed.

Environment summary SDK Version: 3.9.1 OS Version: Windows 10 Enterprise (64 bit)

Additional context The full reproduction code:

[TestClass]
public sealed class Test
{
    [TestMethod]
    public async Task Transaction_should_be_rejected_when_if_none_match_read_condition_is_not_fulfilled()
    {
        // Setup random partition key
        var pk = Guid.NewGuid().ToString();

        // First create an item 'A' so that it exists before batch execution
        await Test.Container.CreateItemAsync(new MyItem { Id = "A", Pk = pk });

        // Now run a transactional batch that:
        // 1) Makes a conditional read on item 'A' to ensure that it doesn't exist (should fail)
        // 2) Creates item 'B'
        var batch = Test.Container.CreateTransactionalBatch(new PartitionKey(pk));
        batch.ReadItem("A", new TransactionalBatchItemRequestOptions { IfNoneMatchEtag = "*" });
        batch.CreateItem(new MyItem { Pk = pk, Id = "B" });
        await batch.ExecuteAsync();

        // Item 'B' should not exist now since the batch should have failed (pre-condition 
        // was not fulfilled)
        var response = await Test.Container.ReadItemStreamAsync("B", new PartitionKey(pk));
        Assert.AreEqual(HttpStatusCode.NotFound, response.StatusCode);
    }

    [TestMethod]
    public async Task Transaction_should_be_rejected_when_if_match_read_condition_is_not_fulfilled()
    {
        // Setup random partition key
        var pk = Guid.NewGuid().ToString();

        // First create an item 'A' and capture its etag
        var oldEtag = (await Test.Container.CreateItemAsync(new MyItem { Id = "A", Pk = pk })).ETag;

        // Replace the item so that it gets a new etag
        var newEtag = (await Test.Container.ReplaceItemAsync(new MyItem { Id = "A", Pk = pk }, "A")).ETag;

        // Make sure the new etag is different
        Assert.AreNotEqual(oldEtag, newEtag);

        // Now run a transactional batch that:
        // 1) Makes a conditional read on item 'A' to ensure that it has the old etag (should fail)
        // 2) Creates item 'B'
        var batch = Test.Container.CreateTransactionalBatch(new PartitionKey(pk));
        batch.ReadItem("A", new TransactionalBatchItemRequestOptions { IfMatchEtag = oldEtag });
        batch.CreateItem(new MyItem { Pk = pk, Id = "B" });
        await batch.ExecuteAsync();

        // Item 'B' should not exist now since the batch should have failed (pre-condition was not fulfilled)
        var response = await Test.Container.ReadItemStreamAsync("B", new PartitionKey(pk));
        Assert.AreEqual(HttpStatusCode.NotFound, response.StatusCode);
    }

    public sealed class MyItem
    {
        [JsonProperty("id")]
        public string Id { get; set; }

        [JsonProperty("pk")]
        public string Pk { get; set; }
    }

    private static string EmulatorEndpoint { get; set; }
        = Environment.GetEnvironmentVariable("CosmosDbEmulator.Endpoint") ?? "https://localhost:8081/";

    private static string EmulatorAccountKey { get; }
        = "C2y6yDjf5/R+ob0N8A7Cgv30VRDJIWEHLM+4QDU5DE2nQ9nDuVTqobD4b8mGGyPMbIZnqyMsEcaGQy67XIw/Jw==";

    private static CosmosClient Client { get; set; }

    private static Database Database { get; set; }

    private static Container Container { get; set; }

    [ClassInitialize]
    public static async Task ClassInitializeAsync(TestContext _)
    {
        Test.Client = new CosmosClient(Test.EmulatorEndpoint, Test.EmulatorAccountKey);
        Test.Database = (await Test.Client.CreateDatabaseIfNotExistsAsync("_test_")).Database;
        Test.Container = (await Test.Database.CreateContainerAsync(Guid.NewGuid().ToString(), "/pk")).Container;
    }

    [ClassCleanup]
    public static async Task ClassCleanupAsync()
    {
        await Test.Container.DeleteContainerAsync();
        Test.Client.Dispose();
    }
}

About this issue

  • Original URL
  • State: open
  • Created 4 years ago
  • Reactions: 3
  • Comments: 17 (5 by maintainers)

Most upvoted comments

Yeah, it’s part of a transaction that also creates another item used for tracking changes. This is a simplified example:

var item = await _container.ReadItemAsync(...);

item.SomeDictionary.Add("key", "value");

// Other stuff happens that opens some 300ms window of time

var batch = _container.CreateTransactionalBatch(pk)
    .UpsertItem(item, new TransactionalBatchItemRequestOptions { IfMatchEtag = item.ETag })
    .CreateItem(changeTrackerItem);

var response = await batch.ExecuteAsync();

Theoretically, it shouldn’t be possible for changes to become overwritten and lost if my understanding of the ETag is correct. If multiple threads crisscross, every one except the thread to first complete the batch will fail with a PreconditionFailed exception. We retry those so eventually we should have all the applied changes. But for some reason changes do become overwritten.

It’s a bug. The fix will take several weeks to ship out globally.