nats-server: KV: update with latest revision sometimes succeeds if it shouldn't
Defect
I am trying to create a nats KV storage backend for certmagic and caddy. The interface contains a distributed lock which I want to create in a safe and correct way using sequence numbers.
The Lock method saves the last sequence/revision after a succesfull call to n.Client.Update
to a map.
Unlock reads from this map and deletes the value with this revision.
If try to create many locks at the same time i get errors like these sometimes (not always!):
=== RUN TestNats_MultipleLocks
nats_test.go:297: Unlock() nats-58 error = nats: API error 10071: wrong last sequence: 41: 38
nats_test.go:297: Unlock() nats-26 error = nats: API error 10071: wrong last sequence: 41: 37
nats_test.go:297: Unlock() nats-30 error = nats: API error 10071: wrong last sequence: 41: 39
nats_test.go:297: Unlock() nats-19 error = nats: API error 10071: wrong last sequence: 46: 44
nats_test.go:297: Unlock() nats-72 error = nats: API error 10071: wrong last sequence: 60: 59
nats_test.go:297: Unlock() nats-79 error = nats: API error 10071: wrong last sequence: 70: 68
These errors suggests that another lock also successfully wrote a value to the kv store. Could be that i muss something obvious in my code but i think that this shouldn’t be possible.
The test code:
func TestNats_MultipleLocks(t *testing.T) {
lockKey := path.Join("acme", "example.com", "sites", "example.com")
wg := sync.WaitGroup{}
for i := 0; i < 100; i++ {
wg.Add(1)
go func(i int) {
defer wg.Done()
n := getNatsClient("basic")
n.ConnectionName = fmt.Sprintf("nats-%d", i)
err := n.Lock(context.Background(), lockKey)
if err != nil {
t.Errorf("Lock() %s error = %v: %d", n.ConnectionName, err, n.getRev("LOCK."+lockKey))
}
err = n.Unlock(context.Background(), lockKey)
if err != nil {
t.Errorf("Unlock() %s error = %v: %d", n.ConnectionName, err, n.getRev("LOCK."+lockKey))
}
}(i)
}
wg.Wait()
}
- [ x] Included a [Minimal, Complete, and Verifiable example] (https://stackoverflow.com/help/mcve)
Versions of nats-server
and affected client libraries used:
Server: v2.8.4 and latest from main Client: v1.16.0 and latest from main
OS/Container environment:
Linux x86
Steps or code to reproduce the issue:
The full code can be found here:
https://github.com/HeavyHorst/certmagic-nats
The interesting methods are Lock and Unlock.
I use an embedded nats server for testing so that it is possible to just run
go test -v ./... -count=1
(count=1 to disable the test cache) to reproduce the issue.
Just run go test several times (3-5 always triggers the issue for me)
Expected result:
No sequence related errors
Actual result:
nats_test.go:297: Unlock() nats-19 error = nats: API error 10071: wrong last sequence: 46: 44 …
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 23 (13 by maintainers)
Rock solid now!