kubernetes: 1.16: etcd client does not parse IPv6 addresses correctly when members are joining
What happened:
Running kubeadm 1.16.1 on CentOS7 with local etcd, the first kube master (2001:db8:101:53e9::1:17
) is up and running, try to make the second master (2001:db8:101:53e9::1:2d
) join the cluster and got error:
I1006 00:05:57.730235 30231 etcd.go:107] etcd endpoints read from pods: https://[2001:db8:101:53e9::1:17]:2379
I1006 00:05:57.750843 30231 etcd.go:156] etcd endpoints read from etcd: https://[2001:db8:101:53e9::1:17]:2379
I1006 00:05:57.750907 30231 etcd.go:125] update etcd endpoints: https://[2001:db8:101:53e9::1:17]:2379
failed to dial endpoint https://[2001:db8:101:53e9::1:17]:2379 with maintenance client: context deadline exceeded
etcd cluster is not healthy
k8s.io/kubernetes/cmd/kubeadm/app/phases/etcd.CheckLocalEtcdClusterStatus
/workspace/anago-v1.16.1-beta.0.37+d647ddbd755faf/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/phases/etcd/local.go:87
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/join.runCheckEtcdPhase
/workspace/anago-v1.16.1-beta.0.37+d647ddbd755faf/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/join/checketcd.go:68
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run.func1
/workspace/anago-v1.16.1-beta.0.37+d647ddbd755faf/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:236
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).visitAll
/workspace/anago-v1.16.1-beta.0.37+d647ddbd755faf/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:424
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run
/workspace/anago-v1.16.1-beta.0.37+d647ddbd755faf/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:209
k8s.io/kubernetes/cmd/kubeadm/app/cmd.NewCmdJoin.func1
/workspace/anago-v1.16.1-beta.0.37+d647ddbd755faf/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/join.go:169
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).execute
/workspace/anago-v1.16.1-beta.0.37+d647ddbd755faf/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:830
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).ExecuteC
/workspace/anago-v1.16.1-beta.0.37+d647ddbd755faf/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:914
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).Execute
/workspace/anago-v1.16.1-beta.0.37+d647ddbd755faf/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:864
k8s.io/kubernetes/cmd/kubeadm/app.Run
/workspace/anago-v1.16.1-beta.0.37+d647ddbd755faf/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/kubeadm.go:50
main.main
_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/kubeadm.go:25
runtime.main
/usr/local/go/src/runtime/proc.go:200
runtime.goexit
/usr/local/go/src/runtime/asm_amd64.s:1337
error execution phase check-etcd
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run.func1
/workspace/anago-v1.16.1-beta.0.37+d647ddbd755faf/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:237
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).visitAll
/workspace/anago-v1.16.1-beta.0.37+d647ddbd755faf/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:424
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run
/workspace/anago-v1.16.1-beta.0.37+d647ddbd755faf/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:209
k8s.io/kubernetes/cmd/kubeadm/app/cmd.NewCmdJoin.func1
/workspace/anago-v1.16.1-beta.0.37+d647ddbd755faf/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/join.go:169
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).execute
/workspace/anago-v1.16.1-beta.0.37+d647ddbd755faf/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:830
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).ExecuteC
/workspace/anago-v1.16.1-beta.0.37+d647ddbd755faf/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:914
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).Execute
/workspace/anago-v1.16.1-beta.0.37+d647ddbd755faf/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:864
k8s.io/kubernetes/cmd/kubeadm/app.Run
/workspace/anago-v1.16.1-beta.0.37+d647ddbd755faf/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/kubeadm.go:50
main.main
_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/kubeadm.go:25
runtime.main
/usr/local/go/src/runtime/proc.go:200
runtime.goexit
/usr/local/go/src/runtime/asm_amd64.s:1337
log of etcd on the first kube master shows:
2019-10-06 18:31:14.493853 I | mvcc: finished scheduled compaction at 56939 (took 1.44269ms)
2019-10-06 18:33:03.880905 I | embed: rejected connection from "[2001:db8:101:53e9::1:2d]:60812" (error "remote error: tls: bad certificate", ServerName "[2001")
2019-10-06 18:33:04.891260 I | embed: rejected connection from "[2001:db8:101:53e9::1:2d]:60818" (error "remote error: tls: bad certificate", ServerName "[2001")
2019-10-06 18:33:06.416960 I | embed: rejected connection from "[2001:db8:101:53e9::1:2d]:60820" (error "remote error: tls: bad certificate", ServerName "[2001")
2019-10-06 18:33:08.748958 I | embed: rejected connection from "[2001:db8:101:53e9::1:2d]:60822" (error "remote error: tls: bad certificate", ServerName "[2001")
2019-10-06 18:33:13.451584 I | embed: rejected connection from "[2001:db8:101:53e9::1:2d]:60828" (error "remote error: tls: bad certificate", ServerName "[2001")
2019-10-06 18:33:18.742184 I | embed: rejected connection from "[2001:db8:101:53e9::1:2d]:60834" (error "remote error: tls: bad certificate", ServerName "[2001")
2019-10-06 18:36:14.504374 I | mvcc: store.index: compact 57302
Verified the cert used by the 2nd kube master has the right server name (of the IP).
What you expected to happen:
kubeadm join
bring the 2nd kube master into the cluster.
How to reproduce it (as minimally and precisely as possible):
kubeadm init
on master-1kubeadm join
on master-2
Anything else we need to know?:
- I used the same steps to create 1.15.3 without problem
- This is a pure IPv6 env that does not have any IPv4.
Environment:
- Kubernetes version (use
kubectl version
):
# kubectl version
Client Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.1", GitCommit:"d647ddbd755faf07169599a625faf302ffc34458", GitTreeState:"clean", BuildDate:"2019-10-02T17:01:15Z", GoVersion:"go1.12.10", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.1", GitCommit:"d647ddbd755faf07169599a625faf302ffc34458", GitTreeState:"clean", BuildDate:"2019-10-02T16:51:36Z", GoVersion:"go1.12.10", Compiler:"gc", Platform:"linux/amd64"}
- Cloud provider or hardware configuration: Openstack
- OS (e.g:
cat /etc/os-release
):
# cat /etc/os-release
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"
CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"
- Kernel (e.g.
uname -a
):
# uname -a
Linux x1-master-2.x1-host.rkn.ksng.io 3.10.0-1062.1.2.el7.x86_64 #1 SMP Mon Sep 30 14:19:46 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
- Install tools:
# kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.1", GitCommit:"d647ddbd755faf07169599a625faf302ffc34458", GitTreeState:"clean", BuildDate:"2019-10-02T16:58:27Z", GoVersion:"go1.12.10", Compiler:"gc", Platform:"linux/amd64"}
- Network plugin and version (if this is a network-related bug): calico
- Others: host has IPv6 only, there is no IPv4 interfaces.
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 29 (24 by maintainers)
etcd backports:
We will backport the etcd fix to 3.4 and 3.3 once it is merged to master.
oh, nevermind, it’s https://github.com/kubernetes/kubernetes/blob/release-1.16/vendor/github.com/coreos/etcd/clientv3/balancer/resolver/endpoint/endpoint.go#L231-L240
Yes 1.15.3 always worked fine, fyi, we have 5 or 6 clusters running 1.15.3 (both ubuntu and centos, if it matters), couple of dev/test clusters got redeployed roughly once a week in the past couple of months, there are more personal deployments.
I tried to tag k8s.gcr.io/etcd:3.3.10 from 1.15.3 to the new version and the behavior was the same, I tested only once though.