etcd: 3.4.18 etcd data inconsistency
What happened?
A 3 etcd cluster returned inconsistent data.
‘bad’ Node
{
"level": "info",
"ts": "2022-06-09T09:21:10.368Z",
"caller": "mvcc/kvstore_compaction.go:56",
"msg": "finished scheduled compaction",
"compact-revision": 165517829,
"took": "13.589883ms"
}
{
"level": "fatal",
"ts": "2022-06-09T09:22:13.472Z",
"caller": "mvcc/kvstore_txn.go:150",
"msg": "range failed to find revision pair",
"revision-main": 165519108,
"revision-sub": 0,
"stacktrace": "
go.etcd.io/etcd/mvcc.(storeTxnRead).rangeKeys (http://go.etcd.io/etcd/mvcc.(storeTxnRead).rangeKeys)
/home/ec2-user/workplace/etcd/src/etcd/mvcc/kvstore_txn.go:150
go.etcd.io/etcd/mvcc.(storeTxnRead).Range (http://go.etcd.io/etcd/mvcc.(storeTxnRead).Range)
/home/ec2-user/workplace/etcd/src/etcd/mvcc/kvstore_txn.go:51
go.etcd.io/etcd/mvcc.(metricsTxnWrite).Range (http://go.etcd.io/etcd/mvcc.(metricsTxnWrite).Range)
/home/ec2-user/workplace/etcd/src/etcd/mvcc/metrics_txn.go:37
go.etcd.io/etcd/etcdserver.(applierV3backend).Range (http://go.etcd.io/etcd/etcdserver.(applierV3backend).Range)
/home/ec2-user/workplace/etcd/src/etcd/etcdserver/apply.go:286
go.etcd.io/etcd/etcdserver.(EtcdServer).Range.func3 (http://go.etcd.io/etcd/etcdserver.(EtcdServer).Range.func3)
/home/ec2-user/workplace/etcd/src/etcd/etcdserver/v3_server.go:121
go.etcd.io/etcd/etcdserver.(EtcdServer).doSerialize (http://go.etcd.io/etcd/etcdserver.(EtcdServer).doSerialize)
/home/ec2-user/workplace/etcd/src/etcd/etcdserver/v3_server.go:600
go.etcd.io/etcd/etcdserver.(EtcdServer).Range (http://go.etcd.io/etcd/etcdserver.(EtcdServer).Range)
/home/ec2-user/workplace/etcd/src/etcd/etcdserver/v3_server.go:122
go.etcd.io/etcd/etcdserver/api/v3rpc.(kvServer).Range (http://go.etcd.io/etcd/etcdserver/api/v3rpc.(kvServer).Range)
/home/ec2-user/workplace/etcd/src/etcd/etcdserver/api/v3rpc/key.go:52
go.etcd.io/etcd/etcdserver/etcdserverpb._KV_Range_Handler.func1 (http://go.etcd.io/etcd/etcdserver/etcdserverpb._KV_Range_Handler.func1)
/home/ec2-user/workplace/etcd/src/etcd/etcdserver/etcdserverpb/rpc.pb.go:3554
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1 (http://github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1)
/home/ec2-user/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.0.1-0.20190118093823-f849b5445de4/chain.go:31 (http://github.com/grpc-ecosystem/go-grpc-middleware@v1.0.1-0.20190118093823-f849b5445de4/chain.go:31)
github.com/grpc-ecosystem/go-grpc-prometheus.(ServerMetrics).UnaryServerInterceptor.func1 (http://github.com/grpc-ecosystem/go-grpc-prometheus.(ServerMetrics).UnaryServerInterceptor.func1)
/home/ec2-user/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-prometheus@v1.2.0/server_metrics.go:107 (http://github.com/grpc-ecosystem/go-grpc-prometheus@v1.2.0/server_metrics.go:107)
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1 (http://github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1)
/home/ec2-user/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.0.1-0.20190118093823-f849b5445de4/chain.go:34 (http://github.com/grpc-ecosystem/go-grpc-middleware@v1.0.1-0.20190118093823-f849b5445de4/chain.go:34)
go.etcd.io/etcd/etcdserver/api/v3rpc.newUnaryInterceptor.func1 (http://go.etcd.io/etcd/etcdserver/api/v3rpc.newUnaryInterceptor.func1)
/home/ec2-user/workplace/etcd/src/etcd/etcdserver/api/v3rpc/interceptor.go:70
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1 (http://github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1)
/home/ec2-user/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.0.1-0.20190118093823-f849b5445de4/chain.go:34 (http://github.com/grpc-ecosystem/go-grpc-middleware@v1.0.1-0.20190118093823-f849b5445de4/chain.go:34)
go.etcd.io/etcd/etcdserver/api/v3rpc.newLogUnaryInterceptor.func1 (http://go.etcd.io/etcd/etcdserver/api/v3rpc.newLogUnaryInterceptor.func1)
/home/ec2-user/workplace/etcd/src/etcd/etcdserver/api/v3rpc/interceptor.go:77
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1 (http://github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1)
/home/ec2-user/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.0.1-0.20190118093823-f849b5445de4/chain.go:39 (http://github.com/grpc-ecosystem/go-grpc-middleware@v1.0.1-0.20190118093823-f849b5445de4/chain.go:39)
go.etcd.io/etcd/etcdserver/etcdserverpb._KV_Range_Handler (http://go.etcd.io/etcd/etcdserver/etcdserverpb._KV_Range_Handler)
/home/ec2-user/workplace/etcd/src/etcd/etcdserver/etcdserverpb/rpc.pb.go:3556
google.golang.org/grpc.(Server).processUnaryRPC (http://google.golang.org/grpc.(Server).processUnaryRPC)
/home/ec2-user/go/pkg/mod/google.golang.org/grpc@v1.26.0/server.go:1024 (http://google.golang.org/grpc@v1.26.0/server.go:1024)
google.golang.org/grpc.(Server).handleStream (http://google.golang.org/grpc.(Server).handleStream)
/home/ec2-user/go/pkg/mod/google.golang.org/grpc@v1.26.0/server.go:1313 (http://google.golang.org/grpc@v1.26.0/server.go:1313)
google.golang.org/grpc.(Server).serveStreams.func1.1 (http://google.golang.org/grpc.(Server).serveStreams.func1.1)
/home/ec2-user/go/pkg/mod/google.golang.org/grpc@v1.26.0/server.go:722 (http://google.golang.org/grpc@v1.26.0/server.go:722)“
}
kube-controller-manager continuously cannot update its lock due to partial data in etcd.
W0630 19:46:09.303166 10 watcher.go:199] watch chan error: etcdserver: mvcc: required revision has been compacted
E0630 19:45:38.273530 10 status.go:71] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"converting (v1.ConfigMap) to (core.Endpoints): Subsets not present in src"}
What did you expect to happen?
The error range failed to find revision pair
should never ever happen because read transaction will either read from write buffer or boltdb.
We reported this case as a reference.
How can we reproduce it (as minimally and precisely as possible)?
Unfortunately, I don’t have any clue to reproduce it.
Anything else we need to know?
- mvcc revision diverged between nodes.

- db size diverged

- compact failed to apply because of the revision divergence.
- consistency checker was not enabled in this old cluster to help oncall operators to retain WAL logs in time.
Etcd version (please run commands below)
$ etcd --version
v3.4.18
Etcd configuration (command line flags or environment variables)
paste your configuration here
Etcd debug information (please run commands blow, feel free to obfuscate the IP address or FQDN in the output)
$ etcdctl member list -w table
# paste output here
$ etcdctl --endpoints=<member list> endpoint status -w table
# paste output here
Relevant log output
No response
About this issue
- Original URL
- State: open
- Created 2 years ago
- Reactions: 1
- Comments: 22 (19 by maintainers)
@fuweid
–experimental-backend-bbolt-freelist-type ‘array’, default value in 3.4.18 @fuweid