etcd: rafthttp: request cluster ID mismatch (got a want b) after usage of ETCD_FORCE_NEW_CLUSTER
Bug reporting
Versions tested: 3.1.x and 3.2.x Problem: rafthttp: request cluster ID mismatch (got a want b) if a new member will join
Steps:
Running etcd instance without members:
etcdctl member list 8e9e05c52164694d: name=default peerURLs=http://10.x.y.z:2380 clientURLs=http://10.x.y.z:2379 isLeader=true
Preparing to add a new member:
etcdctl member add instance-2 https://10.x.y.zz:2380
Starting etcd with new data-dir:
docker logs 8e631793e4bd8c7e52edf12efd834f24b033b5e1f79f2754dcd426ad113aa745
2017-06-23 18:06:11.603930 I | etcdmain: etcd Version: 3.2.1
2017-06-23 18:06:11.604509 I | etcdmain: Git SHA: 61fc123
2017-06-23 18:06:11.604516 I | etcdmain: Go Version: go1.8.3
2017-06-23 18:06:11.604520 I | etcdmain: Go OS/Arch: linux/amd64
2017-06-23 18:06:11.604523 I | etcdmain: setting maximum number of CPUs to 4, total number of available CPUs is 4
2017-06-23 18:06:11.604617 I | embed: peerTLS: cert = /etc/kubernetes/pki/etcd-server.crt, key = /etc/kubernetes/pki/etcd-server.key, ca = , trusted-ca = /etc/kubernetes/pki/etcd-ca.crt, client-cert-auth = true
2017-06-23 18:06:11.606327 I | embed: listening for peers on https://10.x.y.zz:2380
2017-06-23 18:06:11.606357 W | embed: The scheme of client url http://127.0.0.1:2379 is HTTP while peer key/cert files are presented. Ignored key/cert files.
2017-06-23 18:06:11.606396 I | embed: listening for client requests on 127.0.0.1:2379
2017-06-23 18:06:11.606427 I | embed: listening for client requests on 10.x.y.zz:2379
2017-06-23 18:06:11.611117 I | etcdserver: name = instance-2
2017-06-23 18:06:11.611143 I | etcdserver: data dir = /var/lib/etcd
2017-06-23 18:06:11.611147 I | etcdserver: member dir = /var/lib/etcd/member
2017-06-23 18:06:11.611151 I | etcdserver: heartbeat = 100ms
2017-06-23 18:06:11.611154 I | etcdserver: election = 1000ms
2017-06-23 18:06:11.611157 I | etcdserver: snapshot count = 100000
2017-06-23 18:06:11.611164 I | etcdserver: advertise client URLs = https://10.x.y.zz:2379
2017-06-23 18:06:11.640699 I | etcdserver: starting member 222f88b64e95262a in cluster cdf818194e3a8c32
2017-06-23 18:06:11.640727 I | raft: 222f88b64e95262a became follower at term 0
2017-06-23 18:06:11.640739 I | raft: newRaft 222f88b64e95262a [peers: [], term: 0, commit: 0, applied: 0, lastindex: 0, lastterm: 0]
2017-06-23 18:06:11.640743 I | raft: 222f88b64e95262a became follower at term 1
2017-06-23 18:06:11.658846 W | auth: simple token is not cryptographically signed
2017-06-23 18:06:11.667148 I | rafthttp: started HTTP pipelining with peer 8e9e05c52164694d
2017-06-23 18:06:11.667204 I | rafthttp: starting peer 8e9e05c52164694d...
2017-06-23 18:06:11.667224 I | rafthttp: started HTTP pipelining with peer 8e9e05c52164694d
2017-06-23 18:06:11.693402 I | rafthttp: started streaming with peer 8e9e05c52164694d (writer)
2017-06-23 18:06:11.693470 I | rafthttp: started streaming with peer 8e9e05c52164694d (writer)
2017-06-23 18:06:11.710553 I | rafthttp: started peer 8e9e05c52164694d
2017-06-23 18:06:11.710579 I | rafthttp: added peer 8e9e05c52164694d
2017-06-23 18:06:11.710608 I | etcdserver: starting server... [version: 3.2.1, cluster version: to_be_decided]
2017-06-23 18:06:11.710623 I | embed: ClientTLS: cert = /etc/kubernetes/pki/etcd-server.crt, key = /etc/kubernetes/pki/etcd-server.key, ca = , trusted-ca = /etc/kubernetes/pki/etcd-ca.crt, client-cert-auth = false
2017-06-23 18:06:11.711758 I | rafthttp: started streaming with peer 8e9e05c52164694d (stream MsgApp v2 reader)
2017-06-23 18:06:11.711920 I | rafthttp: started streaming with peer 8e9e05c52164694d (stream Message reader)
2017-06-23 18:06:11.713513 I | rafthttp: peer 8e9e05c52164694d became active
2017-06-23 18:06:11.713523 I | rafthttp: established a TCP streaming connection with peer 8e9e05c52164694d (stream Message writer)
2017-06-23 18:06:11.713880 I | raft: 222f88b64e95262a [term: 1] received a MsgVote message with higher term from 8e9e05c52164694d [term: 2535]
2017-06-23 18:06:11.713900 I | raft: 222f88b64e95262a became follower at term 2535
2017-06-23 18:06:11.713909 I | raft: 222f88b64e95262a [logterm: 0, index: 0, vote: 0] cast MsgVote for 8e9e05c52164694d [logterm: 2473, index: 707372] at term 2535
2017-06-23 18:06:11.722444 I | rafthttp: established a TCP streaming connection with peer 8e9e05c52164694d (stream Message reader)
2017-06-23 18:06:11.722493 I | rafthttp: established a TCP streaming connection with peer 8e9e05c52164694d (stream MsgApp v2 reader)
2017-06-23 18:06:11.724351 I | raft: raft.node: 222f88b64e95262a elected leader 8e9e05c52164694d at term 2535
2017-06-23 18:06:11.730153 I | rafthttp: receiving database snapshot [index:707372, from 8e9e05c52164694d] ...
2017-06-23 18:06:11.734416 I | rafthttp: established a TCP streaming connection with peer 8e9e05c52164694d (stream MsgApp v2 writer)
2017-06-23 18:06:11.738337 E | rafthttp: request cluster ID mismatch (got 8873e806ab344a8 want cdf818194e3a8c32)
2017-06-23 18:06:11.753937 E | rafthttp: request cluster ID mismatch (got 8873e806ab344a8 want cdf818194e3a8c32)
2017-06-23 18:06:11.754094 I | snap: saved database snapshot to disk [total bytes: 3424256]
2017-06-23 18:06:11.754111 I | rafthttp: received and saved database snapshot [index: 707372, from: 8e9e05c52164694d] successfully
2017-06-23 18:06:11.754193 I | raft: 222f88b64e95262a [commit: 0, lastindex: 0, lastterm: 0] starts to restore snapshot [index: 707372, term: 2473]
2017-06-23 18:06:11.754211 I | raft: log [committed=0, applied=0, unstable.offset=1, len(unstable.Entries)=0] starts to restore snapshot [index: 707372, term: 2473]
2017-06-23 18:06:11.754236 I | raft: 222f88b64e95262a restored progress of 222f88b64e95262a [next = 707373, match = 707372, state = ProgressStateProbe, waiting = false, pendingSnapshot = 0]
2017-06-23 18:06:11.754251 I | raft: 222f88b64e95262a restored progress of 8e9e05c52164694d [next = 707373, match = 0, state = ProgressStateProbe, waiting = false, pendingSnapshot = 0]
2017-06-23 18:06:11.754259 I | raft: 222f88b64e95262a [commit: 707372] restored snapshot [index: 707372, term: 2473]
2017-06-23 18:06:11.755157 I | etcdserver: applying snapshot at index 0...
2017-06-23 18:06:11.755190 E | rafthttp: request cluster ID mismatch (got 8873e806ab344a8 want cdf818194e3a8c32)
2017-06-23 18:06:11.756328 E | rafthttp: request cluster ID mismatch (got 8873e806ab344a8 want cdf818194e3a8c32)
2017-06-23 18:06:11.757356 I | etcdserver: raft applied incoming snapshot at index 707372
2017-06-23 18:06:11.757918 I | etcdserver: recovering lessor...
2017-06-23 18:06:11.762479 I | etcdserver: finished recovering lessor
2017-06-23 18:06:11.762497 I | etcdserver: restoring mvcc store...
2017-06-23 18:06:11.762522 I | mvcc: restore compact to 251962
2017-06-23 18:06:11.768976 E | rafthttp: request cluster ID mismatch (got 8873e806ab344a8 want cdf818194e3a8c32)
2017-06-23 18:06:11.771470 I | etcdserver: finished restoring mvcc store
2017-06-23 18:06:11.771491 I | etcdserver: recovering alarms...
2017-06-23 18:06:11.771708 E | rafthttp: request cluster ID mismatch (got 8873e806ab344a8 want cdf818194e3a8c32)
2017-06-23 18:06:11.773793 E | rafthttp: request cluster ID mismatch (got 8873e806ab344a8 want cdf818194e3a8c32)
2017-06-23 18:06:11.774824 I | etcdserver: finished recovering alarms
2017-06-23 18:06:11.775077 I | etcdserver: recovering auth store...
2017-06-23 18:06:11.775092 I | etcdserver: finished recovering auth store
2017-06-23 18:06:11.775096 I | etcdserver: recovering store v2...
2017-06-23 18:06:11.776548 I | etcdserver: finished recovering store v2
2017-06-23 18:06:11.776561 I | etcdserver: recovering cluster configuration...
2017-06-23 18:06:11.776627 I | etcdserver/api: enabled capabilities for version 3.1
2017-06-23 18:06:11.776638 I | etcdserver/membership: added member 222f88b64e95262a [https://10.x.y.zz:2380] to cluster cdf818194e3a8c32 from store
2017-06-23 18:06:11.776643 I | etcdserver/membership: added member 8e9e05c52164694d [http://10.x.y.z:2380] to cluster cdf818194e3a8c32 from store
2017-06-23 18:06:11.776648 I | etcdserver/membership: set the cluster version to 3.1 from store
2017-06-23 18:06:11.776651 I | etcdserver: finished recovering cluster configuration
2017-06-23 18:06:11.776654 I | etcdserver: removing old peers from network...
2017-06-23 18:06:11.776659 I | rafthttp: stopping peer 8e9e05c52164694d...
2017-06-23 18:06:11.776804 I | rafthttp: closed the TCP streaming connection with peer 8e9e05c52164694d (stream MsgApp v2 writer)
2017-06-23 18:06:11.776812 I | rafthttp: stopped streaming with peer 8e9e05c52164694d (writer)
2017-06-23 18:06:11.776931 I | rafthttp: closed the TCP streaming connection with peer 8e9e05c52164694d (stream Message writer)
2017-06-23 18:06:11.776936 I | rafthttp: stopped streaming with peer 8e9e05c52164694d (writer)
2017-06-23 18:06:11.777074 I | etcdserver: closing old backend...
2017-06-23 18:06:11.778124 I | rafthttp: stopped HTTP pipelining with peer 8e9e05c52164694d
2017-06-23 18:06:11.778184 W | rafthttp: lost the TCP streaming connection with peer 8e9e05c52164694d (stream MsgApp v2 reader)
2017-06-23 18:06:11.778221 I | rafthttp: stopped streaming with peer 8e9e05c52164694d (stream MsgApp v2 reader)
2017-06-23 18:06:11.778263 W | rafthttp: lost the TCP streaming connection with peer 8e9e05c52164694d (stream Message reader)
2017-06-23 18:06:11.778281 E | rafthttp: failed to read 8e9e05c52164694d on stream Message (context canceled)
2017-06-23 18:06:11.778285 I | rafthttp: peer 8e9e05c52164694d became inactive
2017-06-23 18:06:11.778291 I | rafthttp: stopped streaming with peer 8e9e05c52164694d (stream Message reader)
2017-06-23 18:06:11.778296 I | rafthttp: stopped peer 8e9e05c52164694d
2017-06-23 18:06:11.778303 I | rafthttp: removed peer 8e9e05c52164694d
2017-06-23 18:06:11.778307 I | etcdserver: finished removing old peers from network
2017-06-23 18:06:11.778310 I | etcdserver: adding peers from new cluster configuration into network...
2017-06-23 18:06:11.778468 I | rafthttp: starting peer 8e9e05c52164694d...
2017-06-23 18:06:11.778525 I | rafthttp: started HTTP pipelining with peer 8e9e05c52164694d
2017-06-23 18:06:11.779020 I | rafthttp: started peer 8e9e05c52164694d
2017-06-23 18:06:11.779041 I | rafthttp: added peer 8e9e05c52164694d
2017-06-23 18:06:11.779045 I | etcdserver: finished adding peers from new cluster configuration into network...
2017-06-23 18:06:11.779052 I | etcdserver: finished applying incoming snapshot at index 0
2017-06-23 18:06:11.779233 I | etcdserver: published {Name:instance-2 ClientURLs:[https://10.x.y.zz:2379]} to cluster cdf818194e3a8c32
2017-06-23 18:06:11.779309 I | rafthttp: started streaming with peer 8e9e05c52164694d (writer)
2017-06-23 18:06:11.779324 I | rafthttp: started streaming with peer 8e9e05c52164694d (writer)
2017-06-23 18:06:11.779344 I | rafthttp: started streaming with peer 8e9e05c52164694d (stream MsgApp v2 reader)
2017-06-23 18:06:11.779481 I | rafthttp: started streaming with peer 8e9e05c52164694d (stream Message reader)
2017-06-23 18:06:11.779723 I | embed: ready to serve client requests
2017-06-23 18:06:11.780024 N | embed: serving insecure client requests on 127.0.0.1:2379, this is strongly discouraged!
2017-06-23 18:06:11.780057 I | embed: ready to serve client requests
2017-06-23 18:06:11.780217 I | embed: serving client requests on 10.x.y.zz:2379
2017-06-23 18:06:11.780651 I | etcdserver: finished closing old backend
2017-06-23 18:06:11.783490 I | rafthttp: peer 8e9e05c52164694d became active
2017-06-23 18:06:11.783511 I | rafthttp: established a TCP streaming connection with peer 8e9e05c52164694d (stream Message reader)
2017-06-23 18:06:11.783601 I | rafthttp: established a TCP streaming connection with peer 8e9e05c52164694d (stream MsgApp v2 reader)
2017-06-23 18:06:11.789290 E | rafthttp: request cluster ID mismatch (got 8873e806ab344a8 want cdf818194e3a8c32)
2017-06-23 18:06:11.789449 E | rafthttp: request cluster ID mismatch (got 8873e806ab344a8 want cdf818194e3a8c32)
2017-06-23 18:06:11.802985 E | rafthttp: request cluster ID mismatch (got 8873e806ab344a8 want cdf818194e3a8c32)
2017-06-23 18:06:11.804787 I | rafthttp: peer 8e9e05c52164694d became active
2017-06-23 18:06:11.804859 E | rafthttp: request cluster ID mismatch (got 8873e806ab344a8 want cdf818194e3a8c32)
2017-06-23 18:06:11.817990 I | mvcc: store.index: compact 252117
2017-06-23 18:06:11.818618 I | mvcc: finished scheduled compaction at 252117 (took 395.496µs)
2017-06-23 18:06:11.829268 E | rafthttp: request cluster ID mismatch (got 8873e806ab344a8 want cdf818194e3a8c32)
2017-06-23 18:06:11.829560 N | etcdserver/membership: updated the cluster version from 3.1 to 3.2
2017-06-23 18:06:11.829605 I | etcdserver/api: enabled capabilities for version 3.2
2017-06-23 18:06:11.838191 E | rafthttp: request cluster ID mismatch (got 8873e806ab344a8 want cdf818194e3a8c32)
2017-06-23 18:06:11.911400 E | rafthttp: request cluster ID mismatch (got 8873e806ab344a8 want cdf818194e3a8c32)
2017-06-23 18:06:11.911727 E | rafthttp: request cluster ID mismatch (got 8873e806ab344a8 want cdf818194e3a8c32)
2017-06-23 18:06:11.922308 I | rafthttp: established a TCP streaming connection with peer 8e9e05c52164694d (stream MsgApp v2 writer)
2017-06-23 18:06:11.924200 E | rafthttp: request cluster ID mismatch (got 8873e806ab344a8 want cdf818194e3a8c32)
2017-06-23 18:06:11.929602 E | rafthttp: request cluster ID mismatch (got 8873e806ab344a8 want cdf818194e3a8c32)
2017-06-23 18:06:11.930357 E | rafthttp: request cluster ID mismatch (got 8873e806ab344a8 want cdf818194e3a8c32)
2017-06-23 18:06:11.934752 E | rafthttp: request cluster ID mismatch (got 8873e806ab344a8 want cdf818194e3a8c32)
2017-06-23 18:06:11.936455 E | rafthttp: request cluster ID mismatch (got 8873e806ab344a8 want cdf818194e3a8c32)
2017-06-23 18:06:11.937940 E | rafthttp: request cluster ID mismatch (got 8873e806ab344a8 want cdf818194e3a8c32)
2017-06-23 18:06:11.939887 I | rafthttp: established a TCP streaming connection with peer 8e9e05c52164694d (stream Message writer)
2017-06-23 18:06:11.952105 E | rafthttp: request cluster ID mismatch (got 8873e806ab344a8 want cdf818194e3a8c32)
2017-06-23 18:06:11.961230 E | rafthttp: request cluster ID mismatch (got 8873e806ab344a8 want cdf818194e3a8c32)
2017-06-23 18:06:11.964000 E | rafthttp: request cluster ID mismatch (got 8873e806ab344a8 want cdf818194e3a8c32)
This step can be repeated multiple times with the same result, removing member does not help.
Special note: ETCD_FORCE_NEW_CLUSTER was used to get cluster again running. It seems that this seems to corrupt the cluster id in some strange way. Before this command was issued, adding and removing members was no problem.
default:
- etcd
- --advertise-client-urls=http://10.x.y.z:2379
- --data-dir=/var/lib/etcd
- --listen-client-urls=http://10.x.y.z:2379,http://127.0.0.1:2379
- --listen-peer-urls=http://10.x.y.z:2380
- --trusted-ca-file=/etc/kubernetes/pki/etcd-ca.crt
- --cert-file=/etc/kubernetes/pki/etcd-server.crt
- --key-file=/etc/kubernetes/pki/etcd-server.key
- --peer-cert-file=/etc/kubernetes/pki/etcd-server.crt
- --peer-key-file=/etc/kubernetes/pki/etcd-server.key
- --peer-trusted-ca-file=/etc/kubernetes/pki/etcd-ca.crt
- --peer-client-cert-auth
instance-2:
- etcd
- --advertise-client-urls=https://10.x.y.zz:2379
- --data-dir=/var/lib/etcd
- --listen-client-urls=https://10.x.y.zz:2379,http://127.0.0.1:2379
- --initial-cluster=default=http://10.x.y.z:2380,instance-2=https://10.x.y.zz:2380
- --initial-cluster-state=existing
- --name=instance-2
- --listen-peer-urls=https://10.x.y.zz:2380
- --trusted-ca-file=/etc/kubernetes/pki/etcd-ca.crt
- --cert-file=/etc/kubernetes/pki/etcd-server.crt
- --key-file=/etc/kubernetes/pki/etcd-server.key
- --peer-cert-file=/etc/kubernetes/pki/etcd-server.crt
- --peer-key-file=/etc/kubernetes/pki/etcd-server.key
- --peer-trusted-ca-file=/etc/kubernetes/pki/etcd-ca.crt
For matching the deployment and production guidelines, this use-case must work. Any help would be great how to debug and fix it.
Greets Manuel
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Comments: 25 (11 by maintainers)
in my case i got the error
rafthttp: request cluster ID mismatch (got 1b3a88599e79f82b want b33939d80a381a57)
due to incorrect config on one node
two my nodes got in config
env ETCD_INITIAL_CLUSTER=“etcd-01=http://172.16.50.101:2380,etcd-02=http://172.16.50.102:2380,etcd-03=http://172.16.50.103:2380”
and one node got
env ETCD_INITIAL_CLUSTER=“etcd-01=http://172.16.50.101:2380”
to resolve the problem i stopped etcd on all nodes, edited incorrect config, deleted /var/lib/etcd/member folder in all nodes , restarted etcd on all nodes and voila !
p.s.
/var/lib/etcd - is the folder where etcd save its data in my case
One of my nodes has an ETCD_INITIAL_CLUSTER with one node less, that’s because it was added before the third node, let me stop on this: when a new member Is added, is necessary to reconfigure ETCD_INITIAL_CLUSTER for all active members?.
Anyway, I edited that variable and restarted etcd, obtaining the same result. The difference here is that I don’t delete /var/lib/etcd/member folder because it has important cluster data.
My --data-dir=/var/etcd/data, remove and recreate it, that works for me. It seems that something of previous etcd cluster I made left in this directory, which may affect the etcd settings.