kubeadm: Kubeadm fails to bring up a HA cluster due to EOF error when uploading configmap

What keywords did you search in kubeadm issues before filing this one?

  • EOF uploading config

Is this a BUG REPORT or FEATURE REQUEST?

BUG REPORT

Versions

kubeadm version : kubeadm version: &version.Info{Major:“1”, Minor:“13”, GitVersion:“v1.13.0”, GitCommit:“ddf47ac13c1a9483ea035a79cd7c10005ff21a6d”, GitTreeState:“clean”, BuildDate:“2018-12-03T21:02:01Z”, GoVersion:“go1.11.2”, Compiler:“gc”, Platform:“linux/amd64”}

Environment:

  • Kubernetes version : Client Version: version.Info{Major:“1”, Minor:“13”, GitVersion:“v1.13.0”, GitCommit:“ddf47ac13c1a9483ea035a79cd7c10005ff21a6d”, GitTreeState:“clean”, BuildDate:“2018-12-03T21:04:45Z”, GoVersion:“go1.11.2”, Compiler:“gc”, Platform:“linux/amd64”}
  • Cloud provider or hardware configuration: Virtualbox VM
  • OS (e.g. from /etc/os-release): Ubuntu 18.04.1 LTS
  • Kernel (e.g. uname -a):Linux controller-1 4.15.0-38-generic #41-Ubuntu SMP Wed Oct 10 10:59:38 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
  • Other: I have three VM’s running that are connected via a host network with Virtualbox (10.10.0.11, 10.10.0.12 and 10.10.0.13). There is a docker container running on my host that binds to the gateway address for the host-network (10.10.0.1) to provide a control plane endpoint that the controller nodes can use. This works flawlessly with 1.12 version of Kubernetes (also kubeadm install).

What happened?

I’m trying to set up a HA cluster with three control plane nodes. I can successfully bootstrap the first controller but when I try to join the second controller it fails. After writing the etcd pod manifest it tries to write the new kubeadm-config (I guess with the updated controller api endpoints) but it fails with:

error uploading configuration: Get https://10.10.0.1:6443/api/v1/namespaces/kube-system/configmaps/kubeadm-config: unexpected EOF

I’m using a haproxy loadbalancer in front of the three (to-be) API server nodes. HAproxy is quering the health endpoint of the API server and getting successful responses. Before joining the second controller I can successfully curl the endpoint with:

watch -n0.5 curl -k https://10.10.0.1:6443/api/v1/namespaces/kube-public/configmaps/cluster-info

When the second controller joins the above curl will fail with an EOF and only start working about 20-30 seconds later. In the meantime the join command tries to upload the new config and crashes.

What you expected to happen?

I would have expected that the config uploading was successful, either by waiting for a healthy control plane or no problems with the API server in the first place.

How to reproduce it (as minimally and precisely as possible)?

I’m setting this up in a Vagrant environment but I guess it’s no different than described on the https://kubernetes.io/docs/setup/independent/high-availability/ page. Here is my kubeadm config for the first controller:

apiVersion: kubeadm.k8s.io/v1beta1
kind: InitConfiguration
bootstrapTokens:
- ttl: 1s
nodeRegistration:
  name: controller-1
  kubeletExtraArgs:
    node-ip: 10.10.0.11
    hostname-override: controller-1
localAPIEndpoint:
  advertiseAddress: 10.10.0.11
---
apiVersion: kubeadm.k8s.io/v1beta1
kind: ClusterConfiguration
kubernetesVersion: 1.13.0
clusterName: local
useHyperKubeImage: true
apiServer:
  certSANs:
  - "10.10.0.1"
  extraArgs:
    oidc-ca-file: /etc/kubernetes/pki/front-proxy-ca.crt
    oidc-issuer-url: https://keycloak.k8s.local/auth/realms/Kubernetes
    oidc-client-id: kubernetes
    oidc-username-claim: preferred_username
    oidc-username-prefix: user-
    oidc-groups-claim: groups
    oidc-groups-prefix: group-
    advertise-address: 10.10.0.11
    etcd-servers: "https://10.10.0.11:2379,https://10.10.0.12:2379,https://10.10.0.13:2379"
controlPlaneEndpoint: "10.10.0.1:6443"
networking:
  podSubnet: "10.200.0.0/16"

Anything else we need to know?

These are some of the api server logs when the etcd join happens:

E1213 19:03:31.444542       1 status.go:64] apiserver received an error that is not an metav1.Status: rpctypes.EtcdError{code:0xe, desc:"etcdserver: request timed out"}
I1213 19:03:31.444774       1 trace.go:76] Trace[753985076]: "Create /api/v1/namespaces/kube-system/pods" (started: 2018-12-13 19:03:24.441828372 +0000 UTC m=+114.693461957) (total time: 7.00
2869639s):
Trace[753985076]: [7.002869639s] [7.002606089s] END
I1213 19:03:33.109321       1 trace.go:76] Trace[1420167129]: "GuaranteedUpdate etcd3: *core.Node" (started: 2018-12-13 19:03:31.709712754 +0000 UTC m=+121.961346387) (total time: 1.399576303
s):
Trace[1420167129]: [1.399371901s] [1.398742251s] Transaction committed
I1213 19:03:33.109473       1 trace.go:76] Trace[769327697]: "Patch /api/v1/nodes/controller-1/status" (started: 2018-12-13 19:03:31.709618011 +0000 UTC m=+121.961251600) (total time: 1.39984
0945s):
Trace[769327697]: [1.399731089s] [1.39927972s] Object stored in database
I1213 19:03:33.112016       1 trace.go:76] Trace[2022830437]: "Create /api/v1/namespaces/default/events" (started: 2018-12-13 19:03:26.478524582 +0000 UTC m=+116.730158162) (total time: 6.633
469874s):
Trace[2022830437]: [6.633398498s] [6.633329302s] Object stored in database
I1213 19:03:33.112605       1 trace.go:76] Trace[1385314816]: "Create /api/v1/namespaces/kube-system/events" (started: 2018-12-13 19:03:28.97488789 +0000 UTC m=+119.226521475) (total time: 4.
137566155s):
Trace[1385314816]: [4.137467107s] [4.137345776s] Object stored in database
E1213 19:03:35.352793       1 status.go:64] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"context canceled"}
I1213 19:03:35.353023       1 trace.go:76] Trace[1661503768]: "Get /api/v1/namespaces/kube-system/endpoints/kube-controller-manager" (started: 2018-12-13 19:03:25.365882522 +0000 UTC m=+115.6
17516103) (total time: 9.987128454s):
Trace[1661503768]: [9.987128454s] [9.987104356s] END
E1213 19:03:35.886799       1 status.go:64] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"context canceled"}
I1213 19:03:35.887162       1 trace.go:76] Trace[853992113]: "Get /api/v1/namespaces/kube-system/endpoints/kube-controller-manager" (started: 2018-12-13 19:03:25.887702792 +0000 UTC m=+116.13
9336376) (total time: 9.999444244s):
Trace[853992113]: [9.999444244s] [9.999421457s] END
E1213 19:03:35.932627       1 status.go:64] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"context canceled"}
E1213 19:03:35.933601       1 status.go:64] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"context canceled"}

This is my haproxy config:

defaults
  timeout connect 5000ms
  timeout check 5000ms
  timeout server 30000ms
  timeout client 30000

global
  tune.ssl.default-dh-param 2048

listen stats
  bind :9000
  mode http
  stats enable
  stats hide-version
  stats realm Haproxy\ Statistics
  stats uri /stats

listen apiserver
  bind :6443
  mode tcp
  balance roundrobin
  option httpchk GET /healthz
  http-check expect string ok

  server apiserver1 10.10.0.11:6443 check check-ssl verify none
  server apiserver2 10.10.0.12:6443 check check-ssl verify none
  server apiserver3 10.10.0.13:6443 check check-ssl verify none

listen ingress
  bind :80
  mode http
  balance roundrobin

  server worker1 10.10.0.21:30080 check
  server worker2 10.10.0.22:30080 check
  server worker3 10.10.0.23:30080 check

listen ingress-443
  bind :443 ssl crt /usr/local/etc/haproxy/local-ssl.pem
  mode http
  balance roundrobin

  server worker1 10.10.0.21:30080 check
  server worker2 10.10.0.21:30080 check
  server worker3 10.10.0.23:30080 check

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 38 (18 by maintainers)

Commits related to this issue

Most upvoted comments

@ereslibre and others, just wanted to say a big thank you for investigating this! It’s great to see such commitment on making kubeadm a great tool to use.

@ereslibre could you kindly check again the EOF problem on your vagrant setup?

I will have a look at it tonight, the race condition fixed by https://github.com/kubernetes/kubernetes/pull/72030 was slightly different, I never saw this one, but I’m happy to try to reproduce the issue and try to find the root cause. I will report back.

It seems that the HA of either etcd or control plane is not working properly.

If you have grown your cluster to 3 masters etcd was also grown to 3. As per the etcd admin guide in a cluster of 3 the fault tolerance is of 1. If you shut down 2 out of 3 etcd instances your etcd cluster will be unavailable.

Please, let’s keep this issue from now on for the certificate issue, since the reporter provided a repository that contained this problem and we are keeping this issue open because of that. This issue is split in 3 different cases:

  1. Bad etcd certificates: this issue.
  2. Not pre-pulling images when joining a control plane: https://github.com/kubernetes/kubeadm/issues/1341 (fix already merged in master)
  3. Not explicitly waiting for etcd to be healthy when we grow the cluster: https://github.com/kubernetes/kubeadm/issues/1353

For any related issue please refer to the explicit issues linked above, otherwise please let’s open a new bug report since this one is already mixing very different things. Thank you!

My problem was not related to k8s. My nginx-lb config had a very short timeout.

I changed my proxy_timeout from 3s to 24h.

Here is my config user nginx; worker_processes 1;

error_log /var/log/nginx/error.log warn; pid /var/run/nginx.pid;

events { worker_connections 1024; }

http { include /etc/nginx/mime.types; default_type application/octet-stream;

log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
                  '$status $body_bytes_sent "$http_referer" '
                  '"$http_user_agent" "$http_x_forwarded_for"';

access_log  /var/log/nginx/access.log  main;

sendfile        on;
#tcp_nopush     on;

keepalive_timeout  65;

#gzip  on;

include /etc/nginx/conf.d/*.conf;

}

stream { upstream apiserver { #server IP1:6443 weight=5 max_fails=9 fail_timeout=30s; server IP2:6443 weight=5 max_fails=9 fail_timeout=30s; #server IP3:6443 weight=5 max_fails=9 fail_timeout=30s; }

server {
    listen 16443;
    proxy_connect_timeout 1s;
    proxy_timeout 24h;
    proxy_pass apiserver;
}
log_format proxy '$remote_addr [$time_local] '
             '$protocol $status $bytes_sent $bytes_received '
             '$session_time "$upstream_addr" '
             '"$upstream_bytes_sent" "$upstream_bytes_received" "$upstream_connect_time"';
access_log  /var/log/nginx/access.log  proxy;

}

I can see something that will be probably related. When joining the second master to the cluster the https://172.28.128.25:6443/api/v1/namespaces/kube-system/configmaps/kubeadm-config endpoint takes far longer to answer than on the third.

Master 2 (~14 seconds):

[uploadconfig] storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
I0115 12:52:38.633739    2027 round_trippers.go:419] curl -k -v -XGET  -H "Accept: application/json, */*" -H "User-Agent: kubeadm/v1.14.0 (linux/amd64) kubernetes/1b28775" 'https://172.28.128.25:6443/api/v1/namespaces/kube-system/configmaps/kubeadm-config'
I0115 12:52:52.960533    2027 round_trippers.go:438] GET https://172.28.128.25:6443/api/v1/namespaces/kube-system/configmaps/kubeadm-config 200 OK in 14325 milliseconds
I0115 12:52:52.960568    2027 round_trippers.go:444] Response Headers:
I0115 12:52:52.960576    2027 round_trippers.go:447]     Content-Type: application/json
I0115 12:52:52.960593    2027 round_trippers.go:447]     Content-Length: 1149
I0115 12:52:52.960604    2027 round_trippers.go:447]     Date: Tue, 15 Jan 2019 12:52:54 GMT

Master 3 (~40 milliseconds):

[uploadconfig] storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
I0115 12:54:02.519284    2118 round_trippers.go:419] curl -k -v -XGET  -H "Accept: application/json, */*" -H "User-Agent: kubeadm/v1.14.0 (linux/amd64) kubernetes/1b28775" 'https://172.28.128.25:6443/api/v1/namespaces/kube-system/configmaps/kubeadm-config'
I0115 12:54:02.557663    2118 round_trippers.go:438] GET https://172.28.128.25:6443/api/v1/namespaces/kube-system/configmaps/kubeadm-config 200 OK in 38 milliseconds
I0115 12:54:02.557732    2118 round_trippers.go:444] Response Headers:
I0115 12:54:02.557741    2118 round_trippers.go:447]     Content-Type: application/json
I0115 12:54:02.557747    2118 round_trippers.go:447]     Content-Length: 1218
I0115 12:54:02.557753    2118 round_trippers.go:447]     Date: Tue, 15 Jan 2019 12:54:02 GMT

So I can confirm that the first control plane join takes a bit longer, so it seems that we have yet another cause for this issue and this would match with @masantiago and @fmehrdad descriptions.

I did the kubeadm join with -v10 in order to find what was the request that was blocking. I’ll keep digging to see the root cause of this, but we have 3 different issues causing the same visible problem, maybe it’s time to split this issue 😃

So, two things after discussing with @fabriziopandini:

  1. Copying the whole /etc/kubernetes/pki from one machine to another will lead to this problem.

    • When we do a kubeadm join we are only checking if the certificates are present, not checking if the SANs match what we expect. @fabriziopandini is +1 to not only check for the presence of the certificate, but to check also the SANs of it, and if it doesn’t match what we expect, we regenerate them. Priority #0 is etcd, then #1 is apiserver.
  2. The lack of image pre-pullling is addressed in issue https://github.com/kubernetes/kubeadm/issues/1341 with a PR in the works here: https://github.com/kubernetes/kubernetes/pull/72870.

As for 1., despite we name the explicit certificates that need copying on the documentation (https://kubernetes.io/docs/setup/independent/high-availability/#steps-for-the-rest-of-the-control-plane-nodes), I think we can expect more people to try to copy /etc/kubernetes/pki directly between machines, basically because it’s handier. If we address the issue of the SAN checking when doing a kubeadm join this wouldn’t be a problem, because the certificates that don’t match what we expect would simply be recreated.

I see this as a temporary solution until we have completely addressed the automatic transmission of secrets between masters when creating an HA cluster with kubeadm. Until we have a proper solution this would make it easier to copy things around between machines.

I’m hitting the same issue regardless of pre-pulling the images. The first master bootstraps successfully whereas the second master fails with unexpected EOF when uploading some configuration. The weirdest part is that when I try adding another master node, it bootstraps successfully!

waffles@kube-master-1:~$ kubectl get nodes
NAME            STATUS   ROLES    AGE   VERSION
kube-master-1   Ready    master   28m   v1.13.1
kube-master-2   Ready    <none>   18m   v1.13.1
kube-master-3   Ready    master   86s   v1.13.1
kube-worker-1   Ready    <none>   26m   v1.13.1
kube-worker-2   Ready    <none>   25m   v1.13.1
kube-worker-3   Ready    <none>   25m   v1.13.1

Apparently, the second master isn’t a master, but the third master has no trouble. The images were pre-pulled in all machines, so I guess pre-pulling doesn’t affect anything.

My kubeadm-config.yaml was the same as the one in docs (only difference is in the pod subnet because I was using Flannel):

apiVersion: kubeadm.k8s.io/v1beta1
kind: ClusterConfiguration
kubernetesVersion: stable
networking:
  podSubnet: 10.244.0.0/16
apiServer:
  certSANs:
  - "PUBLIC_IP"
controlPlaneEndpoint: "PUBLIC_IP:PORT"

@iverberk Could you confirm this by adding another master after the second one fails?