nats-server: Duplicates message received while consuming with leafnode connection
Defect
Messages are delivered twice to the same client.
The NATS client receives each message 2 times. Both messages are delivered almost at the same time (<10ms delta) and are identical (Same headers, sequence id, body, delivery count)
- We experience this issue on all our environments
- We are able to reproduce the issue in a local kind cluster
- Unfortunately we are not able to make up a deterministic procedure to create the issue
- We need to restart servers multiple times in a sequence we could not understand for the problem to appear
Versions of nats-server
and affected client libraries used:
nats-server v2.8.4
nats.go v1.16.0
OS/Container environment:
Running nats in Kubernetes cluster with official helm chart.
Steps or code to reproduce the issue:
Setup diagram:
Given:
- 2 nats clusters (A and B) composed of 2 nats-server instances each with jetstream running on all instances
- Cluster A establishes leaf node connections to the cluster B
- A go service (Application 1, Application 2) using the client lib version
nats.go@1.16.0
is connected to the cluster B and consumes messages from a stream that lives in the cluster A (thus via a leafnode connection).
When:
- an unexpected sequence of events happen, nats server restarts, network partitions…
Then:
- the cluster enters incoherent state and doesn’t recover (all nodes running, no network partition)
- The application instance receives each messages 2 times
- The duplicated messages are received in the same millisecond on the same instance
- Restarting the application does not resolve the issue
- The issue does not self resolve with time
- Messages are systematically and constantly received 2 times.
- Restarting the nats-server that sends the duplicates (the one which the client is connected to) is the ONLY way to stop that problem
Logs from a NATS server that sends duplicates (duplicates start at sequence 104): nats-0-trace.log
Expected result:
Each message should be received once from the same consumer or subscription.
Actual result:
The go client subscribing to the remote consumer receives duplicated message.
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 28 (17 by maintainers)
Commits related to this issue
- [FIXED] LeafNode: possible duplicate messages in complex setup This is specific to setup described [here](https://github.com/nats-io/nats-server/issues/3191#issuecomment-1296974382) and does not requ... — committed to nats-io/nats-server by kozlovic 2 years ago
- Extensive test in support of issue #3191. Signed-off-by: Derek Collison <derek@nats.io> — committed to nats-io/nats-server by derekcollison 2 years ago
- Merge pull request #3694 from nats-io/test-3191 Extensive test in support of issue #3191. — committed to nats-io/nats-server by derekcollison 2 years ago
Note that PR #3604 addresses the issue reported by @chenchunping (with the provided configuration files and steps to reproduce). If the original poster and others that posted before that still experience the duplicate issue after upgrading to v2.9.6 (to be released with a fix for the duplicate messages), please try to provide all configuration files and steps to reproduce similar to @chenchunping. Thanks!