etcd: after delete 3 nodes, other node restart cause panic
3 nodes, add other 3 nodes, delete 3 nodes, restart one of left node, cause panic when start. But If I do a snapshot, restart is OK.
I know when node start, it will recover snapshot, and recover wal. the wal conclude 3 confChangeRemoveNode conf, so start will run it again. But if do a snapshot, and restart node, it will not run the remove configs, so it will run ok.
But I think there is a bug.
Snapshot:
term=7 index=70007 nodes=[32cd075ce865df07 4532f632c49aac79 5fe693db009bbced 761a61507fe72261 799f6226de4c7bed 8b7a6e3d29b86eae] confstate={"voters":[3660590167739457287,4986318435359829113,6910373247063932141,8510221444241105505,8763831318964108269,10050466727402696366],"auto_leave":false}
Start dumping log entries from snapshot.
WAL metadata:
nodeID=761a61507fe72261 clusterID=1109e69692ba9883 term=9 commitIndex=78639 vote=5fe693db009bbced
WAL entries:
lastIndex=78639
term index type data
8 77940 conf method=ConfChangeRemoveNode id=799f6226de4c7bed
8 77941 conf method=ConfChangeRemoveNode id=8b7a6e3d29b86eae
8 77945 conf method=ConfChangeRemoveNode id=4532f632c49aac79
Entry types (ConfigChange) count is : 3
"msg":"newRaft 761a61507fe72261 [peers: [32cd075ce865df07,4532f632c49aac79,5fe693db009bbced,761a61507fe72261,799f6226de4c7bed,8b7a6e3d29b86eae]
voters=(3660590167739457287 4986318435359829113 6910373247063932141 8510221444241105505 8763831318964108269 10050466727402696366)"}
voters=(3660590167739457287 4986318435359829113 6910373247063932141 8510221444241105505 10050466727402696366)"}
voters=(3660590167739457287 4986318435359829113 6910373247063932141 8510221444241105505 8763831318964108269 10050466727402696366)“} voters=(3660590167739457287 4986318435359829113 6910373247063932141 8510221444241105505 10050466727402696366)”}
{“level”:“warn”,“ts”:“2021-11-05T15:45:59.551+0800”,“caller”:“membership/cluster.go:427”,“msg”:“skipped removing already removed member”,“cluster-id”:“1109e69692ba9883”,“local-member-id”:“761a61507fe72261”,“removed-remote-peer-id”:“799f6226de4c7bed”} {“level”:“panic”,“ts”:“2021-11-05T15:45:59.551+0800”,“caller”:“rafthttp/transport.go:346”,“msg”:“unexpected removal of unknown remote peer”,“remote-peer-id”:“799f6226de4c7bed”,“stacktrace”:“go.etcd.io/etcd/server/v3/etcdserver/api/rafthttp.(*Transport).removePeer\n\t/export/working/src/github.com/go.etcd.io/etcd/server/etcdserver/api/rafthttp/transport.go:346\ngo.etcd.io/etcd/server/v3/etcdserver/api/rafthttp.(*Transport).RemovePeer\n\t/export/working/src/github.com/go.etcd.io/etcd/server/etcdserver/api/rafthttp/transport.go:329\ngo.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).applyConfChange\n\t/export/working/src/github.com/go.etcd.io/etcd/server/etcdserver/server.go:2301\ngo.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).apply\n\t/export/working/src/github.com/go.etcd.io/etcd/server/etcdserver/server.go:2133\ngo.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).applyEntries\n\t/export/working/src/github.com/go.etcd.io/etcd/server/etcdserver/server.go:1357\ngo.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).applyAll\n\t/export/working/src/github.com/go.etcd.io/etcd/server/etcdserver/server.go:1179\ngo.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).run.func8\n\t/export/working/src/github.com/go.etcd.io/etcd/server/etcdserver/server.go:1111\ngo.etcd.io/etcd/pkg/v3/schedule.(*fifo).run\n\t/export/working/src/github.com/go.etcd.io/etcd/pkg/schedule/schedule.go:157”}
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 16 (15 by maintainers)
Please check StoreV2 deprecation plan in https://github.com/etcd-io/etcd/issues/12913
FYI. I summarized this issue in the following page, https://github.com/ahrtr/etcd-issues/tree/master/13466