etcd: Cannot setup 3 node etcd cluster

cat /etc/os-release

NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

Machines:

k8s-master-01 - 192.168.232.100 k8s-master-02 - 192.168.232.101 k8s-master-03 - 192.168.232.102

etcd --version

etcd Version: 3.2.9
Git SHA: f1d7dd8
Go Version: go1.8.3
Go OS/Arch: linux/amd64

How to reproduce:

[All machines] yum install etcd

[k8s-master-01]:

etcd --name infra0 --initial-advertise-peer-urls http://192.168.232.100:2380 \
  --listen-peer-urls http://192.168.232.100:2380 \
  --listen-client-urls http://192.168.232.100:2379,http://127.0.0.1:2379 \
  --advertise-client-urls http://192.168.232.100:2379 \
  --initial-cluster-token etcd-cluster-1 \
  --initial-cluster infra0=http://192.168.232.100:2380,infra1=http://192.168.232.101:2380,infra2=http://192.168.232.102:2380 \
  --initial-cluster-state new
  --auto-tls \
  --peer-auto-tls

[k8s-master-02]:

[root@k8s-master-02 etcd]# etcd --name infra0 --initial-advertise-peer-urls http://192.168.232.101:2380   --listen-peer-urls http://192.168.232.101:2380   --listen-client-urls http://192.168.232.101:2379,http://127.0.0.1:2379   --advertise-client-urls http://192.168.232.101:2379   --initial-cluster-token etcd-cluster-1   --initial-cluster infra0=http://192.168.232.100:2380,infra1=http://192.168.232.101:2380,infra2=http://192.168.232.102:2380   --initial-cluster-state new --auto-tls --peer-auto-tls
2018-01-08 11:54:12.705581 I | etcdmain: etcd Version: 3.2.9
2018-01-08 11:54:12.705639 I | etcdmain: Git SHA: f1d7dd8
2018-01-08 11:54:12.705644 I | etcdmain: Go Version: go1.8.3
2018-01-08 11:54:12.705649 I | etcdmain: Go OS/Arch: linux/amd64
2018-01-08 11:54:12.705655 I | etcdmain: setting maximum number of CPUs to 2, total number of available CPUs is 2
2018-01-08 11:54:12.705665 W | etcdmain: no data-dir provided, using default data-dir ./infra0.etcd
2018-01-08 11:54:12.705707 W | etcdmain: found invalid file/dir fixtures under data dir infra0.etcd (Ignore this if you are upgrading etcd)
2018-01-08 11:54:12.705718 N | etcdmain: the server is already initialized as member before, starting as etcd member...
2018-01-08 11:54:12.705758 I | embed: peerTLS: cert = infra0.etcd/fixtures/peer/cert.pem, key = infra0.etcd/fixtures/peer/key.pem, ca = , trusted-ca = , client-cert-auth = false
2018-01-08 11:54:12.705768 W | embed: The scheme of peer url http://192.168.232.101:2380 is HTTP while peer key/cert files are presented. Ignored peer key/cert files.
2018-01-08 11:54:12.705820 I | embed: listening for peers on http://192.168.232.101:2380
2018-01-08 11:54:12.705851 W | embed: The scheme of client url http://127.0.0.1:2379 is HTTP while peer key/cert files are presented. Ignored key/cert files.
2018-01-08 11:54:12.705892 I | embed: listening for client requests on 127.0.0.1:2379
2018-01-08 11:54:12.705904 W | embed: The scheme of client url http://192.168.232.101:2379 is HTTP while peer key/cert files are presented. Ignored key/cert files.
2018-01-08 11:54:12.705943 I | embed: listening for client requests on 192.168.232.101:2379
2018-01-08 11:54:12.733618 I | etcdmain: --initial-cluster must include infra0=http://192.168.232.101:2380 given --initial-advertise-peer-urls=http://192.168.232.101:2380
[root@k8s-master-02 etcd]# 

[k8s-master-03]:


[root@k8s-master-03 ~]# etcd --name infra0 --initial-advertise-peer-urls http://192.168.232.102:2380   --listen-peer-urls http://192.168.232.102:2380   --listen-client-urls http://192.168.232.102:2379,http://127.0.0.1:2379   --advertise-client-urls http://192.168.232.102:2379   --initial-cluster-token etcd-cluster-1   --initial-cluster infra0=http://192.168.232.100:2380,infra1=http://192.168.232.101:2380,infra2=http://192.168.232.102:2380   --initial-cluster-state new
2018-01-08 11:46:56.713062 I | etcdmain: etcd Version: 3.2.9
2018-01-08 11:46:56.713131 I | etcdmain: Git SHA: f1d7dd8
2018-01-08 11:46:56.713137 I | etcdmain: Go Version: go1.8.3
2018-01-08 11:46:56.713142 I | etcdmain: Go OS/Arch: linux/amd64
2018-01-08 11:46:56.713147 I | etcdmain: setting maximum number of CPUs to 2, total number of available CPUs is 2
2018-01-08 11:46:56.713164 W | etcdmain: no data-dir provided, using default data-dir ./infra0.etcd
2018-01-08 11:46:56.713501 I | embed: listening for peers on http://192.168.232.102:2380
2018-01-08 11:46:56.713566 I | embed: listening for client requests on 127.0.0.1:2379
2018-01-08 11:46:56.713600 I | embed: listening for client requests on 192.168.232.102:2379
2018-01-08 11:46:56.753553 I | etcdmain: --initial-cluster must include infra0=http://192.168.232.102:2380 given --initial-advertise-peer-urls=http://192.168.232.102:2380
[root@k8s-master-03 ~]# 

[root@k8s-master-01 etcd]# etcd --name infra0 --initial-advertise-peer-urls http://192.168.232.100:2380   --listen-peer-urls http://192.168.232.100:2380   --listen-client-urls http://192.168.232.100:2379,http://127.0.0.1:2379   --advertise-client-urls http://192.168.232.100:2379   --initial-cluster-token etcd-cluster-1   --initial-cluster infra0=http://192.168.232.100:2380,infra1=http://192.168.232.101:2380,infra2=http://192.168.232.102:2380   --initial-cluster-state new
2018-01-08 11:46:53.877388 I | etcdmain: etcd Version: 3.2.9
2018-01-08 11:46:53.877461 I | etcdmain: Git SHA: f1d7dd8
2018-01-08 11:46:53.877467 I | etcdmain: Go Version: go1.8.3
2018-01-08 11:46:53.877475 I | etcdmain: Go OS/Arch: linux/amd64
2018-01-08 11:46:53.877480 I | etcdmain: setting maximum number of CPUs to 2, total number of available CPUs is 2
2018-01-08 11:46:53.877491 W | etcdmain: no data-dir provided, using default data-dir ./infra0.etcd
2018-01-08 11:46:53.877549 N | etcdmain: the server is already initialized as member before, starting as etcd member...
2018-01-08 11:46:53.877619 I | embed: listening for peers on http://192.168.232.100:2380
2018-01-08 11:46:53.877665 I | embed: listening for client requests on 127.0.0.1:2379
2018-01-08 11:46:53.877705 I | embed: listening for client requests on 192.168.232.100:2379
2018-01-08 11:46:53.880290 I | etcdserver: name = infra0
2018-01-08 11:46:53.880300 I | etcdserver: data dir = infra0.etcd
2018-01-08 11:46:53.880305 I | etcdserver: member dir = infra0.etcd/member
2018-01-08 11:46:53.880310 I | etcdserver: heartbeat = 100ms
2018-01-08 11:46:53.880315 I | etcdserver: election = 1000ms
2018-01-08 11:46:53.880319 I | etcdserver: snapshot count = 100000
2018-01-08 11:46:53.880331 I | etcdserver: advertise client URLs = http://192.168.232.100:2379
2018-01-08 11:46:53.880802 I | etcdserver: restarting member 1a0f423a850b33 in cluster ddec615d236f5865 at commit index 3
2018-01-08 11:46:53.880838 I | raft: 1a0f423a850b33 became follower at term 245
2018-01-08 11:46:53.880854 I | raft: newRaft 1a0f423a850b33 [peers: [], term: 245, commit: 3, applied: 0, lastindex: 3, lastterm: 1]
2018-01-08 11:46:53.883679 W | auth: simple token is not cryptographically signed
2018-01-08 11:46:53.884744 I | etcdserver: starting server... [version: 3.2.9, cluster version: to_be_decided]
2018-01-08 11:46:53.885612 I | etcdserver/membership: added member 1a0f423a850b33 [http://192.168.232.100:2380] to cluster ddec615d236f5865
2018-01-08 11:46:53.885702 I | etcdserver/membership: added member 87f5c922a6a67302 [http://192.168.232.101:2380] to cluster ddec615d236f5865
2018-01-08 11:46:53.885722 I | rafthttp: starting peer 87f5c922a6a67302...
2018-01-08 11:46:53.885753 I | rafthttp: started HTTP pipelining with peer 87f5c922a6a67302
2018-01-08 11:46:53.887630 I | rafthttp: started streaming with peer 87f5c922a6a67302 (writer)
2018-01-08 11:46:53.887700 I | rafthttp: started streaming with peer 87f5c922a6a67302 (writer)
2018-01-08 11:46:53.888769 I | rafthttp: started peer 87f5c922a6a67302
2018-01-08 11:46:53.888808 I | rafthttp: added peer 87f5c922a6a67302
2018-01-08 11:46:53.888835 I | rafthttp: started streaming with peer 87f5c922a6a67302 (stream MsgApp v2 reader)
2018-01-08 11:46:53.888902 I | etcdserver/membership: added member c6b41ba06674f9bd [http://192.168.232.102:2380] to cluster ddec615d236f5865
2018-01-08 11:46:53.888918 I | rafthttp: starting peer c6b41ba06674f9bd...
2018-01-08 11:46:53.888932 I | rafthttp: started HTTP pipelining with peer c6b41ba06674f9bd
2018-01-08 11:46:53.889189 I | rafthttp: started streaming with peer 87f5c922a6a67302 (stream Message reader)
2018-01-08 11:46:53.889476 I | rafthttp: started streaming with peer c6b41ba06674f9bd (writer)
2018-01-08 11:46:53.891582 I | rafthttp: started peer c6b41ba06674f9bd
2018-01-08 11:46:53.891609 I | rafthttp: added peer c6b41ba06674f9bd
2018-01-08 11:46:53.891975 I | rafthttp: started streaming with peer c6b41ba06674f9bd (writer)
2018-01-08 11:46:53.891996 I | rafthttp: started streaming with peer c6b41ba06674f9bd (stream MsgApp v2 reader)
2018-01-08 11:46:53.892013 I | rafthttp: started streaming with peer c6b41ba06674f9bd (stream Message reader)
2018-01-08 11:46:54.381147 I | raft: 1a0f423a850b33 is starting a new election at term 245
2018-01-08 11:46:54.381215 I | raft: 1a0f423a850b33 became candidate at term 246
2018-01-08 11:46:54.381243 I | raft: 1a0f423a850b33 received MsgVoteResp from 1a0f423a850b33 at term 246
2018-01-08 11:46:54.381255 I | raft: 1a0f423a850b33 [logterm: 1, index: 3] sent MsgVote request to 87f5c922a6a67302 at term 246
2018-01-08 11:46:54.381265 I | raft: 1a0f423a850b33 [logterm: 1, index: 3] sent MsgVote request to c6b41ba06674f9bd at term 246
2018-01-08 11:46:55.981149 I | raft: 1a0f423a850b33 is starting a new election at term 246
2018-01-08 11:46:55.981181 I | raft: 1a0f423a850b33 became candidate at term 247
2018-01-08 11:46:55.981192 I | raft: 1a0f423a850b33 received MsgVoteResp from 1a0f423a850b33 at term 247
2018-01-08 11:46:55.981203 I | raft: 1a0f423a850b33 [logterm: 1, index: 3] sent MsgVote request to 87f5c922a6a67302 at term 247
2018-01-08 11:46:55.981212 I | raft: 1a0f423a850b33 [logterm: 1, index: 3] sent MsgVote request to c6b41ba06674f9bd at term 247
2018-01-08 11:46:57.281119 I | raft: 1a0f423a850b33 is starting a new election at term 247
2018-01-08 11:46:57.281156 I | raft: 1a0f423a850b33 became candidate at term 248
2018-01-08 11:46:57.281168 I | raft: 1a0f423a850b33 received MsgVoteResp from 1a0f423a850b33 at term 248
2018-01-08 11:46:57.281178 I | raft: 1a0f423a850b33 [logterm: 1, index: 3] sent MsgVote request to 87f5c922a6a67302 at term 248
2018-01-08 11:46:57.281190 I | raft: 1a0f423a850b33 [logterm: 1, index: 3] sent MsgVote request to c6b41ba06674f9bd at term 248
2018-01-08 11:46:58.889273 W | rafthttp: health check for peer 87f5c922a6a67302 could not connect: dial tcp 192.168.232.101:2380: getsockopt: connection refused
2018-01-08 11:46:58.892059 W | rafthttp: health check for peer c6b41ba06674f9bd could not connect: dial tcp 192.168.232.102:2380: getsockopt: connection refused
2018-01-08 11:46:59.181119 I | raft: 1a0f423a850b33 is starting a new election at term 248
2018-01-08 11:46:59.181150 I | raft: 1a0f423a850b33 became candidate at term 249
2018-01-08 11:46:59.181162 I | raft: 1a0f423a850b33 received MsgVoteResp from 1a0f423a850b33 at term 249
2018-01-08 11:46:59.181174 I | raft: 1a0f423a850b33 [logterm: 1, index: 3] sent MsgVote request to 87f5c922a6a67302 at term 249
2018-01-08 11:46:59.181184 I | raft: 1a0f423a850b33 [logterm: 1, index: 3] sent MsgVote request to c6b41ba06674f9bd at term 249
2018-01-08 11:47:00.381130 I | raft: 1a0f423a850b33 is starting a new election at term 249
2018-01-08 11:47:00.381174 I | raft: 1a0f423a850b33 became candidate at term 250
2018-01-08 11:47:00.381187 I | raft: 1a0f423a850b33 received MsgVoteResp from 1a0f423a850b33 at term 250
2018-01-08 11:47:00.381199 I | raft: 1a0f423a850b33 [logterm: 1, index: 3] sent MsgVote request to 87f5c922a6a67302 at term 250
2018-01-08 11:47:00.381208 I | raft: 1a0f423a850b33 [logterm: 1, index: 3] sent MsgVote request to c6b41ba06674f9bd at term 250
2018-01-08 11:47:00.885407 E | etcdserver: publish error: etcdserver: request timed out
2018-01-08 11:47:02.281133 I | raft: 1a0f423a850b33 is starting a new election at term 250
2018-01-08 11:47:02.281195 I | raft: 1a0f423a850b33 became candidate at term 251
2018-01-08 11:47:02.281211 I | raft: 1a0f423a850b33 received MsgVoteResp from 1a0f423a850b33 at term 251
2018-01-08 11:47:02.281227 I | raft: 1a0f423a850b33 [logterm: 1, index: 3] sent MsgVote request to 87f5c922a6a67302 at term 251
2018-01-08 11:47:02.281238 I | raft: 1a0f423a850b33 [logterm: 1, index: 3] sent MsgVote request to c6b41ba06674f9bd at term 251

etcdctl member gets connection refused.

Or I end up with

etcd.service main process exited code=exited status=203/exec

After I do

systemctl start etcd

Independently from which tutorial I follow I cant get working cluster. Tried following Kelsey Hightower’s The Hard Way, also failed; followed several different tuts and debugged things - also no results. I’m fighting with etcd for like a 5-6 days, nothings helps at all. So far bare-metal 9-node HA with kubernetes seems impossible.

Either it hangs on /up/ or I got fail error How to correctly setup a 3 node etcd cluster?

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 19 (11 by maintainers)

Most upvoted comments

Also the name needs to match the name of the node. I see --name infra0 used twice. I have made these kinds of mistakes before, an old German friend calls it copy and waste 😃. It is important that the configs are walked through one by one to make sure they are correct. I think we are close now 😃

Maybe providing data dir would help?

Well you can see it is writing to the default data-dir $name.etcd.

2018-01-08 22:08:40.658578 W | etcdmain: no data-dir provided, using default data-dir ./infra0.etcd

lets nuke those for each machine as I am not confident what is going on there. and if you would like to define a --data-dir feel free to do so.

@aryadrottning machine 2 logs shows this.

2018-01-08 22:08:53.168900 I | embed: listening for peers on http://localhost:2380
2018-01-08 22:08:53.169114 I | embed: listening for client requests on localhost:2379

these are default values http://localhost:2380 meaning they were not passed to etcd when it started unless you used localhost?. Thus etcd fell back to them.

Lets do this

  • nuke all data dirs.
  • double check your configs for all nodes carefully.
  • bootstrap cluster again. If node N fails again please post the full logs vs snippet.

Thank you.