etcd: Data inconsistency in etcd version 3.5.0 (3.5.x rollback> 3.4 upgrade-> 3.5) story

When I enter the following command:

ETCDCTL_API=3  etcdctl --endpoints https://192.168.2.2:4003 get /Canal/can/locks/health-host-192-168-2-2 -w json; ETCDCTL_API=3  etcdctl --endpoints https://192.168.2.3:4003 get /Canal/can/locks/health-host-192-168-2-2 -w json; ETCDCTL_API=3  etcdctl --endpoints https://192.168.2.4:4003 get /Canal/can/locks/health-host-192-168-2-2 -w json; 

I get the response:

{"header":{"cluster_id":5771341481381694818,"member_id":8096206227661897536,"revision":973575,"raft_term":16},"kvs":[{"key":"L0NhbmFsL2Nhbi9sb2Nrcy9oZWFsdGgtaG9zdC0xOTItMTY4LTItMg==","create_revision":2,"mod_revision":973561,"version":122303,"value":"eyJraW5kIjoiTG9jayIsImFwaVZlcnNpb24iOiJjYW4vdjFhbHBoYTEiLCJtZXRhZGF0YSI6eyJuYW1lIjoiaGVhbHRoLWhvc3QtMTkyLTE2OC0yLTIiLCJjcmVhdGlvblRpbWVzdGFtcCI6IjIwMjEtMTItMDFUMjE6NTg6NTdaIn0sImxvY2tpZCI6ImhlYWx0aC1ob3N0LTE5Mi0xNjgtMi0yIn0K"}],"count":1}
{"header":{"cluster_id":5771341481381694818,"member_id":14749687755696706107,"revision":973575,"raft_term":16},"kvs":[{"key":"L0NhbmFsL2Nhbi9sb2Nrcy9oZWFsdGgtaG9zdC0xOTItMTY4LTItMg==","create_revision":2,"mod_revision":973561,"version":122303,"value":"eyJraW5kIjoiTG9jayIsImFwaVZlcnNpb24iOiJjYW4vdjFhbHBoYTEiLCJtZXRhZGF0YSI6eyJuYW1lIjoiaGVhbHRoLWhvc3QtMTkyLTE2OC0yLTIiLCJjcmVhdGlvblRpbWVzdGFtcCI6IjIwMjEtMTItMDFUMjE6NTg6NTdaIn0sImxvY2tpZCI6ImhlYWx0aC1ob3N0LTE5Mi0xNjgtMi0yIn0K"}],"count":1}
{"header":{"cluster_id":5771341481381694818,"member_id":9436515569532730235,"revision":922759,"raft_term":16},"kvs":[{"key":"L0NhbmFsL2Nhbi9sb2Nrcy9oZWFsdGgtaG9zdC0xOTItMTY4LTItMg==","create_revision":2,"mod_revision":922630,"version":120692,"value":"eyJraW5kIjoiTG9jayIsImFwaVZlcnNpb24iOiJjYW4vdjFhbHBoYTEiLCJtZXRhZGF0YSI6eyJuYW1lIjoiaGVhbHRoLWhvc3QtMTkyLTE2OC0yLTIiLCJjcmVhdGlvblRpbWVzdGFtcCI6IjIwMjEtMTItMDFUMjE6NTQ6MzFaIn0sImxvY2tpZCI6ImhlYWx0aC1ob3N0LTE5Mi0xNjgtMi0yIn0K"}],"count":1}

different mod_revision and version on 192.168.2.4 node

Reproduce Procedure:

  1. reboot the three nodes.
  2. when the 192.168.2.4 node starts up, it may have replaced the old etcd data into the etcd data directory, I guess.

If the second point is true, there will be data inconsistency? why can the node with broken data added to the etcd cluster?

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 26 (21 by maintainers)

Most upvoted comments

With a test specific for this problem on the way and no reproduction I’m inclined to remove this as a blocker for v3.5.2 release.

Intuitively applyEntries should get never executed if we are in the wrong term. Applies should happen only up to the last ‘HardState’ found in the WAL log and it should determine whether entry is a subject to apply or not.
So I think it’s harmful defense in depth, but will need to deeper dive from the original logic from pre-refactoring to confirm.

The cause of corruption is https://github.com/etcd-io/etcd/blob/451ea5406edff59c1c881833ed7ba32cb9253f01/server/storage/schema/cindex.go#L66-L69

Outdated term field (caused by downgrade) results in etcd applying the record without updating CI. The case is artificial because downgrades are not officially supported, but this brought up two issues that are not related to downgrades. Still thinking what exactly is the problem that should be fixed:

  • Etcd v3.5 trusting term stored in DB and not one in WAL. This doesn’t seem safe in v3.5, possibly should be delayed to v3.6?
  • Etcd applying the entry without changing CI.

Possibly both cases would be worth fixing. cc @spzala @ahrtr @ptabor for opinion.

Please refer to discussion in pull/13844 and issues/13766

Hi @michaljasionowski thanks for the input!

are you able to repeat this result?

Yes, It is repeatable. I will covert it into code which makes it easy be reproduced by someone else.

I’ve been trying to reproduce it with your reproduction log with no success

I think I missed pasting something in the repro execution logs.

After this step,

## put the data again
~/etcd-binaries/v3.5.2/etcdctl --endpoints http://127.0.0.1:32379 put foo error

infra10 won't be able to persist consistent_index 12 to disk due to its term is 3 < 10

you need to kill infra10 and restart it which will reload its lagging-behind consistent_index from disk, reload raft storage from disk to memory, replay its log and apply to backend if necessary. In infra10, it will apply twice the number of raft entries from when it received mutations requests.

Also please git checkout main && git pull --rebase before running

gvm use go1.17.8
PASSES="build" ./scripts/test.sh -v 
cp ./bin/tools/etcd-dump-db ~/etcd-binaries/etcd-dump-db

This will help visualize the consistent_index with human-readable format instead of encoded plain bytes.

Let me know if it still does not work for you, thanks.

I’m unable to reproduce this with Etcd v3.5.1 and etcdctl v3.4 and v3.5. I run 3 member cluster where each member restarts every 10 seconds with a periodic get revision (every 1 second) and etcdctl check perf to generate load. I have never seen an inconsistent read.

Can you provide more detailed reproduction steps?