kubernetes: kube-apiserver 1.13.x refuses to work when first etcd-server is not available.

How to reproduce the problem: Set up a new demo cluster with kubeadm 1.13.1. Create default configurationwith kubeadm config print init-defaults Initialize cluster as usual with kubeadm init

Change the --etcd-servers list in kube-apiserver manifest to --etcd-servers=https://127.0.0.2:2379,https://127.0.0.1:2379, so that the first etcd node is unavailable (“connection refused”).

The kube-apiserver is then not able to connect to etcd any more.

Last message: Unable to create storage backend: config (\u0026{ /registry [https://127.0.0.2:2379 https://127.0.0.1:2379] /etc/kubernetes/pki/apiserver-etcd-client.key /etc/kubernetes/pki/apiserver-etcd-client.crt /etc/kubernetes/pki/etcd/ca.crt true 0xc000381dd0 \u003cnil\u003e 5m0s 1m0s}), err (dial tcp 127.0.0.2:2379: connect: connection refused)\n","stream":"stderr","time":"2018-12-17T12:13:19.608822816Z"}

kube-apiserver does not start.

If I upgrade etcd to version 3.3.10, it reports an error remote error: tls: bad certificate", ServerName ""

Environment:

Kubernetes version 1.13.1
kubeadm in Vagrant box

I also experience this bug in an environment with a real etcd cluster.

/kind bug

About this issue

Original URL
State: closed
Created 6 years ago
Reactions: 33
Comments: 68 (42 by maintainers)

Commits related to this issue

Revert "sys: update default kube version to v1.13.3" This reverts commit 50d667ad13a5dd958078efceea3112651c16347f. Broken release: https://github.com/kubernetes/kubernetes/issues/72102 — committed to utilitywarehouse/tf_kube_ignition by george-angel 5 years ago
include first node's ip in all etcd member's SANs workaround for https://github.com/kubernetes/kubernetes/issues/72102, replicate kubeadm behaviour — committed to utilitywarehouse/tf_kube_ignition by george-angel 5 years ago
Upgrade etcd from 3.3.13 to 3.3.14 This fixes a big issue with apiserver <-> etcd interaction and mutual TLS, as defined in [1] and [2]. [1]: https://github.com/etcd-io/etcd/releases/tag/v3.3.14 [2]... — committed to jhunt/k8s-boshrelease by jhunt 5 years ago
sys: rm SAN hack including 0 etcd host in all certs Issue resolved in >v1.16.3 https://github.com/kubernetes/kubernetes/issues/72102#issuecomment-542808932 — committed to utilitywarehouse/tf_kube_ignition by george-angel 4 years ago
Use etcd 3.4 client 3.5 seems to have regressed on https://github.com/kubernetes/kubernetes/issues/72102 — committed to justinsb/etcdadm by justinsb a year ago
Use etcd 3.4 client 3.5 seems to have regressed on https://github.com/kubernetes/kubernetes/issues/72102 — committed to justinsb/etcdadm by justinsb a year ago
Use etcd 3.4 client 3.5 seems to have regressed on https://github.com/kubernetes/kubernetes/issues/72102 — committed to justinsb/etcdadm by justinsb a year ago
Use etcd 3.4 client 3.5 seems to have regressed on https://github.com/kubernetes/kubernetes/issues/72102 — committed to justinsb/etcdadm by justinsb a year ago

Most upvoted comments

I was able to repro this issue with the repro steps provided by @Cytrian. I also reproduced this issue with a real etcd cluster.

As @JishanXing previously mentioned, the problem is caused by a bug in the etcd v3 client library (or perhaps the grpc library). The vault project is also running into this: https://github.com/hashicorp/vault/issues/4349

The problem seems to be that the etcd library uses the first node’s address as the ServerName for TLS. This means that all attempts to connect to any server other than the first will fail with a certificate validation error (i.e. cert has ${nameOfNode2} in SANs, but the client is expecting ${nameOfNode1}).

An important thing to highlight is that when the first etcd server goes down, it also takes the Kubernetes API servers down, because they fail to connect to the remaining etcd servers.

With that said, this all depends on what your etcd server certificates look like:

If you follow the kubeadm instructions to stand up a 3 node etcd cluster, you get a set of certificates that include the first node’s name and IP in the SANs (because all certs are generated on the first etcd node). Thus, you should not run into this issue.
If you have used another process to generate certificates for etcd, and the certs do not include the first node’s name and IP in the SANs, you will most likely run into this issue when the first etcd node goes down.

To reproduce the issue with a real etcd cluster:

Create a 3 node etcd cluster with TLS enabled. Each certificate should only contain the name/IP of the node that will be serving it.
Start an API server that points to the etcd cluster.
Stop the first etcd node.
API server crashes and fails to come back up

Versions:

kubeadm version: v1.13.2
kubernetes api server version: v1.13.2
etcd image: k8s.gcr.io/etcd:3.2.24

API server crash log: https://gist.github.com/alexbrand/ba86f506e4278ed2ada4504ab44b525b

I was unable to reproduce this issue with API server v1.12.5 (n.b. this was somewhat of a non-scientific test => tested by updating the image field of the API server static pod produced by kubeadm v1.13.2)

+14

alexbrand on Feb 4, 2019

We have 3 master and 3 etcdservers, a workaround is to change the order of etcdservers. master0:

--etcd-servers=etcd0,etcd1,etcd2

master1:

--etcd-servers=etcd1,etcd0,etcd2

master2:

--etcd-servers=etcd2,etcd0,etcd1

+10

niuqg on Apr 2, 2019

@timothysc I just came back from trip. Will start working on this from this week! And post updates here.

gyuho on Jul 8, 2019

@dims @jpbetz https://github.com/etcd-io/etcd/releases/tag/v3.3.14-beta.0 has been released with all the fixes. Please try. Once tests look good in the next few days, I will release v3.3.14.

Update: ~https://github.com/etcd-io/etcd/releases/tag/v3.3.14~ https://github.com/etcd-io/etcd/releases/tag/v3.3.15 has been released.

gyuho on Aug 21, 2019

@liggitt that leaves people on 1.13-1.15 without proper H/A? I think this issue deserves to be fixed in the three supported releases of Kubernetes. The hotfix mentioned here looks simple enough to be added, but you are saying the proper fix will require much more. So this is all sorta obscure and confusing to the community, IMO.

Not that I am complaining, don’t get me wrong, it’s just I think everyone would welcome some clarity on this issue. Maybe document it somewhere and provide some workarounds for the people who are still on v1.13.-1.15? Because right now, bring down the first etcd member and oops, api is not working, cluster is not working.

igcherkaev on Aug 30, 2019

@timothysc I will look into this.

gyuho on Feb 8, 2019

@dims just built custom 1.14 image from the release-1.14 branch and patched that credentials.go file and it’s so much better now when I bring down the first etcd node down. Now I am confused, if the fix is so simple, why does it have to wait until 1.16?

igcherkaev on Aug 30, 2019

Update: https://github.com/grpc/grpc-go/releases/tag/v1.23.0 is out. Bumping up gRPC https://github.com/etcd-io/etcd/pull/11029 in etcd master branch, in addition to Go runtime upgrade https://groups.google.com/forum/#!topic/golang-announce/65QixT3tcmg. Once tests look good, we will start working on 3.3 backports.

gyuho on Aug 13, 2019

@igcherkaev We are planning to backport the fix to etcd 3.3 after etcd 3.4 release. Then, kubernetes can pick up the latest etcd 3.3.

gyuho on Jul 31, 2019

Tried with Kubernetes 1.15.3 and with 1.16.2 but its not working with neither. This is not fixed even for IP addresses:

W1107 12:48:06.316691       1 clientconn.go:1120] grpc: addrConn.createTransport failed to connect to {https://172.17.8.202:2379 0  <nil>}. Err :connection error: desc = "transport: authentication handshake failed: x509: certificate is valid for 10.0.2.15, 127.0.0.1, ::1, 172.17.8.202, not 172.17.8.201". Reconnecting...
W1107 12:48:06.328186       1 clientconn.go:1120] grpc: addrConn.createTransport failed to connect to {https://172.17.8.203:2379 0  <nil>}. Err :connection error: desc = "transport: authentication handshake failed: x509: certificate is valid for 10.0.2.15, 127.0.0.1, ::1, 172.17.8.203, not 172.17.8.201". Reconnecting...

yacinelazaar on Nov 7, 2019

The issue is still there with Kubernetes 1.14.6 and etcd 3.3.15. Any changes to kubernetes libs or code needed to tackle this issue?

igcherkaev on Aug 29, 2019

August 13, 2019 the earliest day the fix going to land on 3.3

Yes

gyuho on Aug 3, 2019

@gyuho Can we please wrap up the backport (to etcd 3.3) within the next week or two? Please see timeline for 1.16 ( https://github.com/kubernetes/sig-release/tree/master/releases/release-1.16#timeline ) We need to have sufficient soak time for this change in k/k.

dims on Aug 1, 2019

If anyone needs an urgent fix, please use this - https://github.com/moondev/kubernetes/commit/45f6cb2c96705d03b3b77c16a3680b5526e14729#diff-c9fae1df26aedd520ef93a008d255581R136-R148

dims on Jul 29, 2019

@liggitt @igcherkaev I will work on the documentation.

gyuho on Aug 30, 2019

Is there a hotfix for v1.15?

ylhyh on Aug 21, 2019

Just discussed with gRPC team and @jingyih. We will rework on this in the next few weeks.

gyuho on Jun 13, 2019

I think this was the issue more on how gRPC balancer does failover with credentials.

I’ve shared a workaround to fix this issue in upstream gRPC https://github.com/grpc/grpc-go/pull/2650.

And waiting for their feedback. /cc @xiang90 @jpbetz

gyuho on Feb 21, 2019

No pod manifest involved here. Just a group of etcd and a kube-apiserver. The issue appeared when we rebooted the first etcd node.

Cytrian on Feb 1, 2019