rke: Kubernetes 1.11.1 nodes occasionally do not register internal IP address

Rancher versions: rancher/server or rancher/rancher: 2.0.7 rancher/agent or rancher/rancher-agent: 2.0.6

Docker version: (docker version,docker info preferred) Server: Engine: Version: 18.03.1-ce API version: 1.37 (minimum version 1.12) Go version: go1.9.6 Git commit: 9ee9f40 Built: Thu Apr 26 04:27:49 2018 OS/Arch: linux/amd64 Experimental: false

Operating system and kernel: (cat /etc/os-release, uname -r preferred) NAME=“Container Linux by CoreOS” ID=coreos VERSION=1800.6.0 VERSION_ID=1800.6.0 BUILD_ID=2018-08-04-0323 PRETTY_NAME=“Container Linux by CoreOS 1800.6.0 (Rhyolite)” ANSI_COLOR=“38;5;75” HOME_URL=“https://coreos.com/” BUG_REPORT_URL=“https://issues.coreos.com” COREOS_BOARD=“amd64-usr”

Type/provider of hosts: (VirtualBox/Bare-metal/AWS/GCE/DO) OpenStack

Steps to Reproduce: Deploy Kubernetes 1.11.1 cluster with RKE using the rke_config.yml

rke config:

addon_job_timeout: 30
authentication: 
  strategy: "x509"
ignore_docker_version: true

cloud_provider:
  name: openstack
  openstackCloudProvider:
    global:
      username: {{ openstack_username }}
      password: {{ openstack_password }}
      auth-url: {{ openstack_auth_url }}
      tenant-id: {{ openstack_tenant_id }}
      domain-id: {{ openstack_domain_id }}
    block_storage:
      ignore-volume-az: false
 
ingress: 
  provider: "none"

kubernetes_version: 1.11.1

network: 
  plugin: "canal"

services: 
  etcd: 
    extra_args: 
      heartbeat-interval: 500
      election-timeout: 5000
    snapshot: false
  kubelet:
    extra_args:
      authentication-token-webhook: true
  kube_api: 
    pod_security_policy: false
    extra_args:
      requestheader-client-ca-file: "/etc/kubernetes/ssl/kube-ca.pem"
      requestheader-extra-headers-prefix: "X-Remote-Extra-"
      requestheader-group-headers: "X-Remote-Group"
      requestheader-username-headers: "X-Remote-User"
      proxy-client-cert-file: "/etc/kubernetes/ssl/kube-proxy.pem"
      proxy-client-key-file: "/etc/kubernetes/ssl/kube-proxy-key.pem"

ssh_agent_auth: false

Results

~ $ kubectl get nodes -o wide                                                                                                                                 4339ms  Tue Aug 14 09:57:37 2018
NAME                                     STATUS    ROLES               AGE       VERSION   INTERNAL-IP     EXTERNAL-IP   OS-IMAGE                                        KERNEL-VERSION      CONTAINER-RUNTIME
k8s-corp-prod-0-master-us-corp-kc-8a-0   Ready     controlplane,etcd   3d        v1.11.1   <none>          <none>        Container Linux by CoreOS 1800.6.0 (Rhyolite)   4.14.59-coreos-r2   docker://18.3.1
k8s-corp-prod-0-master-us-corp-kc-8b-1   Ready     controlplane,etcd   3d        v1.11.1   10.144.6.137    <none>        Container Linux by CoreOS 1800.6.0 (Rhyolite)   4.14.59-coreos-r2   docker://18.3.1
k8s-corp-prod-0-master-us-corp-kc-8c-2   Ready     controlplane,etcd   3d        v1.11.1   <none>          <none>        Container Linux by CoreOS 1800.6.0 (Rhyolite)   4.14.59-coreos-r2   docker://18.3.1
k8s-corp-prod-0-worker-us-corp-kc-8a-0   Ready     worker              3d        v1.11.1   <none>          <none>        Container Linux by CoreOS 1800.6.0 (Rhyolite)   4.14.59-coreos-r2   docker://18.3.1
k8s-corp-prod-0-worker-us-corp-kc-8a-1   Ready     worker              3d        v1.11.1   <none>          <none>        Container Linux by CoreOS 1800.6.0 (Rhyolite)   4.14.59-coreos-r2   docker://18.3.1
k8s-corp-prod-0-worker-us-corp-kc-8a-2   Ready     worker              3d        v1.11.1   10.144.2.141    <none>        Container Linux by CoreOS 1800.6.0 (Rhyolite)   4.14.59-coreos-r2   docker://18.3.1
k8s-corp-prod-0-worker-us-corp-kc-8b-0   Ready     worker              3d        v1.11.1   10.144.6.142    <none>        Container Linux by CoreOS 1800.6.0 (Rhyolite)   4.14.59-coreos-r2   docker://18.3.1
k8s-corp-prod-0-worker-us-corp-kc-8b-1   Ready     worker              3d        v1.11.1   <none>          <none>        Container Linux by CoreOS 1800.6.0 (Rhyolite)   4.14.59-coreos-r2   docker://18.3.1
k8s-corp-prod-0-worker-us-corp-kc-8b-2   Ready     worker              3d        v1.11.1   10.144.6.145    <none>        Container Linux by CoreOS 1800.6.0 (Rhyolite)   4.14.59-coreos-r2   docker://18.3.1
k8s-corp-prod-0-worker-us-corp-kc-8c-0   Ready     worker              3d        v1.11.1   10.144.10.137   <none>        Container Linux by CoreOS 1800.6.0 (Rhyolite)   4.14.59-coreos-r2   docker://18.3.1
k8s-corp-prod-0-worker-us-corp-kc-8c-1   Ready     worker              3d        v1.11.1   <none>          <none>        Container Linux by CoreOS 1800.6.0 (Rhyolite)   4.14.59-coreos-r2   docker://18.3.1
k8s-corp-prod-0-worker-us-corp-kc-8c-2   Ready     worker              3d        v1.11.1   10.144.10.148   <none>        Container Linux by CoreOS 1800.6.0 (Rhyolite)   4.14.59-coreos-r2   docker://18.3.1

A restart of the kubelet container on the affected nodes resolves this issue.

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Reactions: 2
  • Comments: 16 (6 by maintainers)

Most upvoted comments

This problem is reported in kubernetes https://github.com/kubernetes/kubernetes/issues/68270, the kubelet fails to get the address of the node from the cloud provider and it fails to update the node status, i will keep the issue open until the issue in k8s is resolved. i can see the following logs from @twittyc gist:

cat kubelet-log.json |  grep -v "Volume not attached" | grep "node status" 
{"log":"E0813 20:38:56.118941   18989 kubelet_node_status.go:391] Error updating node status, will retry: error getting node \"k8s-corp-prod-0-worker-us-corp-kc-8b-1\": Get https://127.0.0.1:6443/api/v1/nodes/k8s-corp-prod-0-worker-us-corp-kc-8b-1?resourceVersion=0\u0026timeout=10s: unexpected EOF\n","stream":"stderr","time":"2018-08-13T20:38:56.121316062Z"}
{"log":"E0813 23:50:34.361091   18989 kubelet_node_status.go:391] Error updating node status, will retry: error getting node \"k8s-corp-prod-0-worker-us-corp-kc-8b-1\": Get https://127.0.0.1:6443/api/v1/nodes/k8s-corp-prod-0-worker-us-corp-kc-8b-1?resourceVersion=0\u0026timeout=10s: unexpected EOF\n","stream":"stderr","time":"2018-08-13T23:50:34.361316592Z"}
{"log":"W0814 01:13:16.823637   18989 kubelet_node_status.go:1114] Failed to set some node status fields: failed to get node address from cloud provider: Timeout after 10s\n","stream":"stderr","time":"2018-08-14T01:13:16.823960043Z"}
{"log":"E0814 07:01:54.730030   18989 kubelet_node_status.go:391] Error updating node status, will retry: error getting node \"k8s-corp-prod-0-worker-us-corp-kc-8b-1\": Get https://127.0.0.1:6443/api/v1/nodes/k8s-corp-prod-0-worker-us-corp-kc-8b-1?resourceVersion=0\u0026timeout=10s: unexpected EOF\n","stream":"stderr","time":"2018-08-14T07:01:54.731184923Z"}

I think it’s more an issue of Kubernetes and the OpenStack cloud provider, maybe in addition to an unstable OpenStack API endpoint, please also refer my issue that contains some log excerpts: https://github.com/kubernetes/cloud-provider-openstack/issues/280