nats-server: KV: update with latest revision sometimes succeeds if it shouldn't

Defect

I am trying to create a nats KV storage backend for certmagic and caddy. The interface contains a distributed lock which I want to create in a safe and correct way using sequence numbers.

The Lock method saves the last sequence/revision after a succesfull call to n.Client.Update to a map. Unlock reads from this map and deletes the value with this revision.

If try to create many locks at the same time i get errors like these sometimes (not always!):

=== RUN   TestNats_MultipleLocks
    nats_test.go:297: Unlock() nats-58 error = nats: API error 10071: wrong last sequence: 41: 38
    nats_test.go:297: Unlock() nats-26 error = nats: API error 10071: wrong last sequence: 41: 37
    nats_test.go:297: Unlock() nats-30 error = nats: API error 10071: wrong last sequence: 41: 39
    nats_test.go:297: Unlock() nats-19 error = nats: API error 10071: wrong last sequence: 46: 44
    nats_test.go:297: Unlock() nats-72 error = nats: API error 10071: wrong last sequence: 60: 59
    nats_test.go:297: Unlock() nats-79 error = nats: API error 10071: wrong last sequence: 70: 68

These errors suggests that another lock also successfully wrote a value to the kv store. Could be that i muss something obvious in my code but i think that this shouldn’t be possible.

The test code:

func TestNats_MultipleLocks(t *testing.T) {                                                                                                                                                              
        lockKey := path.Join("acme", "example.com", "sites", "example.com")                                                                                                                              
        wg := sync.WaitGroup{}                                                                                                                                                                           
        for i := 0; i < 100; i++ {                                                                                                                                                                       
                wg.Add(1)                                                                                                                                                                                
                go func(i int) {                                                                                                                                                                         
                        defer wg.Done()                                                                                                                                                                  
                        n := getNatsClient("basic")                                                                                                                                                      
                        n.ConnectionName = fmt.Sprintf("nats-%d", i)                                                                                                                                     
                                                                                                                                                                                                         
                        err := n.Lock(context.Background(), lockKey)                                                                                                                                     
                        if err != nil {                                                                                                                                                                  
                                t.Errorf("Lock() %s error = %v: %d", n.ConnectionName, err, n.getRev("LOCK."+lockKey))
                        } 
         
                        err = n.Unlock(context.Background(), lockKey)
                        if err != nil {
                                t.Errorf("Unlock() %s error = %v: %d", n.ConnectionName, err, n.getRev("LOCK."+lockKey))
                        }
                }(i)
        }                                                    
        wg.Wait()                                            
}

Versions of nats-server and affected client libraries used:

Server: v2.8.4 and latest from main Client: v1.16.0 and latest from main

OS/Container environment:

Linux x86

Steps or code to reproduce the issue:

The full code can be found here:

https://github.com/HeavyHorst/certmagic-nats

The interesting methods are Lock and Unlock.

I use an embedded nats server for testing so that it is possible to just run go test -v ./... -count=1 (count=1 to disable the test cache) to reproduce the issue. Just run go test several times (3-5 always triggers the issue for me)

Expected result:

No sequence related errors

Actual result:

nats_test.go:297: Unlock() nats-19 error = nats: API error 10071: wrong last sequence: 46: 44 …

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 23 (13 by maintainers)

Most upvoted comments

Rock solid now!