kubeadm: Kubeadm fails to bring up a HA cluster due to EOF error when uploading configmap
What keywords did you search in kubeadm issues before filing this one?
- EOF uploading config
Is this a BUG REPORT or FEATURE REQUEST?
BUG REPORT
Versions
kubeadm version : kubeadm version: &version.Info{Major:“1”, Minor:“13”, GitVersion:“v1.13.0”, GitCommit:“ddf47ac13c1a9483ea035a79cd7c10005ff21a6d”, GitTreeState:“clean”, BuildDate:“2018-12-03T21:02:01Z”, GoVersion:“go1.11.2”, Compiler:“gc”, Platform:“linux/amd64”}
Environment:
- Kubernetes version : Client Version: version.Info{Major:“1”, Minor:“13”, GitVersion:“v1.13.0”, GitCommit:“ddf47ac13c1a9483ea035a79cd7c10005ff21a6d”, GitTreeState:“clean”, BuildDate:“2018-12-03T21:04:45Z”, GoVersion:“go1.11.2”, Compiler:“gc”, Platform:“linux/amd64”}
- Cloud provider or hardware configuration: Virtualbox VM
- OS (e.g. from /etc/os-release): Ubuntu 18.04.1 LTS
- Kernel (e.g.
uname -a):Linux controller-1 4.15.0-38-generic #41-Ubuntu SMP Wed Oct 10 10:59:38 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux - Other: I have three VM’s running that are connected via a host network with Virtualbox (10.10.0.11, 10.10.0.12 and 10.10.0.13). There is a docker container running on my host that binds to the gateway address for the host-network (10.10.0.1) to provide a control plane endpoint that the controller nodes can use. This works flawlessly with 1.12 version of Kubernetes (also kubeadm install).
What happened?
I’m trying to set up a HA cluster with three control plane nodes. I can successfully bootstrap the first controller but when I try to join the second controller it fails. After writing the etcd pod manifest it tries to write the new kubeadm-config (I guess with the updated controller api endpoints) but it fails with:
error uploading configuration: Get https://10.10.0.1:6443/api/v1/namespaces/kube-system/configmaps/kubeadm-config: unexpected EOF
I’m using a haproxy loadbalancer in front of the three (to-be) API server nodes. HAproxy is quering the health endpoint of the API server and getting successful responses. Before joining the second controller I can successfully curl the endpoint with:
watch -n0.5 curl -k https://10.10.0.1:6443/api/v1/namespaces/kube-public/configmaps/cluster-info
When the second controller joins the above curl will fail with an EOF and only start working about 20-30 seconds later. In the meantime the join command tries to upload the new config and crashes.
What you expected to happen?
I would have expected that the config uploading was successful, either by waiting for a healthy control plane or no problems with the API server in the first place.
How to reproduce it (as minimally and precisely as possible)?
I’m setting this up in a Vagrant environment but I guess it’s no different than described on the https://kubernetes.io/docs/setup/independent/high-availability/ page. Here is my kubeadm config for the first controller:
apiVersion: kubeadm.k8s.io/v1beta1
kind: InitConfiguration
bootstrapTokens:
- ttl: 1s
nodeRegistration:
name: controller-1
kubeletExtraArgs:
node-ip: 10.10.0.11
hostname-override: controller-1
localAPIEndpoint:
advertiseAddress: 10.10.0.11
---
apiVersion: kubeadm.k8s.io/v1beta1
kind: ClusterConfiguration
kubernetesVersion: 1.13.0
clusterName: local
useHyperKubeImage: true
apiServer:
certSANs:
- "10.10.0.1"
extraArgs:
oidc-ca-file: /etc/kubernetes/pki/front-proxy-ca.crt
oidc-issuer-url: https://keycloak.k8s.local/auth/realms/Kubernetes
oidc-client-id: kubernetes
oidc-username-claim: preferred_username
oidc-username-prefix: user-
oidc-groups-claim: groups
oidc-groups-prefix: group-
advertise-address: 10.10.0.11
etcd-servers: "https://10.10.0.11:2379,https://10.10.0.12:2379,https://10.10.0.13:2379"
controlPlaneEndpoint: "10.10.0.1:6443"
networking:
podSubnet: "10.200.0.0/16"
Anything else we need to know?
These are some of the api server logs when the etcd join happens:
E1213 19:03:31.444542 1 status.go:64] apiserver received an error that is not an metav1.Status: rpctypes.EtcdError{code:0xe, desc:"etcdserver: request timed out"}
I1213 19:03:31.444774 1 trace.go:76] Trace[753985076]: "Create /api/v1/namespaces/kube-system/pods" (started: 2018-12-13 19:03:24.441828372 +0000 UTC m=+114.693461957) (total time: 7.00
2869639s):
Trace[753985076]: [7.002869639s] [7.002606089s] END
I1213 19:03:33.109321 1 trace.go:76] Trace[1420167129]: "GuaranteedUpdate etcd3: *core.Node" (started: 2018-12-13 19:03:31.709712754 +0000 UTC m=+121.961346387) (total time: 1.399576303
s):
Trace[1420167129]: [1.399371901s] [1.398742251s] Transaction committed
I1213 19:03:33.109473 1 trace.go:76] Trace[769327697]: "Patch /api/v1/nodes/controller-1/status" (started: 2018-12-13 19:03:31.709618011 +0000 UTC m=+121.961251600) (total time: 1.39984
0945s):
Trace[769327697]: [1.399731089s] [1.39927972s] Object stored in database
I1213 19:03:33.112016 1 trace.go:76] Trace[2022830437]: "Create /api/v1/namespaces/default/events" (started: 2018-12-13 19:03:26.478524582 +0000 UTC m=+116.730158162) (total time: 6.633
469874s):
Trace[2022830437]: [6.633398498s] [6.633329302s] Object stored in database
I1213 19:03:33.112605 1 trace.go:76] Trace[1385314816]: "Create /api/v1/namespaces/kube-system/events" (started: 2018-12-13 19:03:28.97488789 +0000 UTC m=+119.226521475) (total time: 4.
137566155s):
Trace[1385314816]: [4.137467107s] [4.137345776s] Object stored in database
E1213 19:03:35.352793 1 status.go:64] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"context canceled"}
I1213 19:03:35.353023 1 trace.go:76] Trace[1661503768]: "Get /api/v1/namespaces/kube-system/endpoints/kube-controller-manager" (started: 2018-12-13 19:03:25.365882522 +0000 UTC m=+115.6
17516103) (total time: 9.987128454s):
Trace[1661503768]: [9.987128454s] [9.987104356s] END
E1213 19:03:35.886799 1 status.go:64] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"context canceled"}
I1213 19:03:35.887162 1 trace.go:76] Trace[853992113]: "Get /api/v1/namespaces/kube-system/endpoints/kube-controller-manager" (started: 2018-12-13 19:03:25.887702792 +0000 UTC m=+116.13
9336376) (total time: 9.999444244s):
Trace[853992113]: [9.999444244s] [9.999421457s] END
E1213 19:03:35.932627 1 status.go:64] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"context canceled"}
E1213 19:03:35.933601 1 status.go:64] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"context canceled"}
This is my haproxy config:
defaults
timeout connect 5000ms
timeout check 5000ms
timeout server 30000ms
timeout client 30000
global
tune.ssl.default-dh-param 2048
listen stats
bind :9000
mode http
stats enable
stats hide-version
stats realm Haproxy\ Statistics
stats uri /stats
listen apiserver
bind :6443
mode tcp
balance roundrobin
option httpchk GET /healthz
http-check expect string ok
server apiserver1 10.10.0.11:6443 check check-ssl verify none
server apiserver2 10.10.0.12:6443 check check-ssl verify none
server apiserver3 10.10.0.13:6443 check check-ssl verify none
listen ingress
bind :80
mode http
balance roundrobin
server worker1 10.10.0.21:30080 check
server worker2 10.10.0.22:30080 check
server worker3 10.10.0.23:30080 check
listen ingress-443
bind :443 ssl crt /usr/local/etc/haproxy/local-ssl.pem
mode http
balance roundrobin
server worker1 10.10.0.21:30080 check
server worker2 10.10.0.21:30080 check
server worker3 10.10.0.23:30080 check
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 38 (18 by maintainers)
Commits related to this issue
- Reproduce https://github.com/kubernetes/kubeadm/issues/1321 original issue — committed to ereslibre/kubernetes-cluster-vagrant by ereslibre 5 years ago
@ereslibre and others, just wanted to say a big thank you for investigating this! It’s great to see such commitment on making kubeadm a great tool to use.
I will have a look at it tonight, the race condition fixed by https://github.com/kubernetes/kubernetes/pull/72030 was slightly different, I never saw this one, but I’m happy to try to reproduce the issue and try to find the root cause. I will report back.
If you have grown your cluster to 3 masters
etcdwas also grown to 3. As per the etcd admin guide in a cluster of 3 the fault tolerance is of 1. If you shut down 2 out of 3 etcd instances your etcd cluster will be unavailable.Please, let’s keep this issue from now on for the certificate issue, since the reporter provided a repository that contained this problem and we are keeping this issue open because of that. This issue is split in 3 different cases:
For any related issue please refer to the explicit issues linked above, otherwise please let’s open a new bug report since this one is already mixing very different things. Thank you!
My problem was not related to k8s. My nginx-lb config had a very short timeout.
I changed my proxy_timeout from 3s to 24h.
Here is my config user nginx; worker_processes 1;
error_log /var/log/nginx/error.log warn; pid /var/run/nginx.pid;
events { worker_connections 1024; }
http { include /etc/nginx/mime.types; default_type application/octet-stream;
}
stream { upstream apiserver { #server IP1:6443 weight=5 max_fails=9 fail_timeout=30s; server IP2:6443 weight=5 max_fails=9 fail_timeout=30s; #server IP3:6443 weight=5 max_fails=9 fail_timeout=30s; }
}
I can see something that will be probably related. When joining the second master to the cluster the
https://172.28.128.25:6443/api/v1/namespaces/kube-system/configmaps/kubeadm-configendpoint takes far longer to answer than on the third.Master 2 (~14 seconds):
Master 3 (~40 milliseconds):
So I can confirm that the first control plane join takes a bit longer, so it seems that we have yet another cause for this issue and this would match with @masantiago and @fmehrdad descriptions.
I did the
kubeadm joinwith-v10in order to find what was the request that was blocking. I’ll keep digging to see the root cause of this, but we have 3 different issues causing the same visible problem, maybe it’s time to split this issue 😃So, two things after discussing with @fabriziopandini:
Copying the whole
/etc/kubernetes/pkifrom one machine to another will lead to this problem.kubeadm joinwe are only checking if the certificates are present, not checking if the SANs match what we expect. @fabriziopandini is +1 to not only check for the presence of the certificate, but to check also the SANs of it, and if it doesn’t match what we expect, we regenerate them. Priority #0 is etcd, then #1 is apiserver.The lack of image pre-pullling is addressed in issue https://github.com/kubernetes/kubeadm/issues/1341 with a PR in the works here: https://github.com/kubernetes/kubernetes/pull/72870.
As for 1., despite we name the explicit certificates that need copying on the documentation (https://kubernetes.io/docs/setup/independent/high-availability/#steps-for-the-rest-of-the-control-plane-nodes), I think we can expect more people to try to copy
/etc/kubernetes/pkidirectly between machines, basically because it’s handier. If we address the issue of the SAN checking when doing akubeadm jointhis wouldn’t be a problem, because the certificates that don’t match what we expect would simply be recreated.I see this as a temporary solution until we have completely addressed the automatic transmission of secrets between masters when creating an HA cluster with kubeadm. Until we have a proper solution this would make it easier to copy things around between machines.
I’m hitting the same issue regardless of pre-pulling the images. The first master bootstraps successfully whereas the second master fails with
unexpected EOFwhen uploading some configuration. The weirdest part is that when I try adding another master node, it bootstraps successfully!Apparently, the second master isn’t a master, but the third master has no trouble. The images were pre-pulled in all machines, so I guess pre-pulling doesn’t affect anything.
My
kubeadm-config.yamlwas the same as the one in docs (only difference is in the pod subnet because I was using Flannel):@iverberk Could you confirm this by adding another master after the second one fails?