rancher: GlusterFS PV failed after kubelet restart

Hello,

we have created a Cluster with Rancher (infos below) and installed a GlusterFS Cluster with heketi described here (https://github.com/heketi/heketi/wiki/Kubernetes-Integration/832a65e365b4644a1c64ac47601893a3fdb52daf)

  • Three GluterFS Servers inside the same K8S Cluster
  • Heketi as Provisioner

When upgrading Rancher from 2.1.3 to 2.1.8 all kubelet processes are restarted, after the process restart pods beginning to fail because they could not read/write to their existing Gluster volume mounts.

It looks like when restarting kubelet all existing mounts are getting terminated, which is OK if a single kubelet process restart on one GlusterFS Node occurred, but when on all three GlusterFS Nodes the process is restarted in a short period of time (which happened on a rancher update/or cluster upgrade), all clients with a PVC will fail.

We could reproduce the issue when we tried to add an additional kubelet parameter to our cluster.yaml, which also forces a kubelet restart on all nodes.

The kubelet log show errors like this:

32067 kubelet.go:1616] Unable to mount volumes for pod \"XXXXXX(c3751cf0-51c9-11e9-9d82-005056897a94)\": timeout expired waiting for volumes to attach or mount for pod \"XXX\"/\"XXXX\". list of unmounted volumes=[volume]. list of unattached volumes=[volume default-token-qlvbv]; skipping pod\n","stream":"stderr","time":"2019-04-02T06:16:17.54248644Z"}
{"log":"E0402 06:08:09.838966   32067 nestedpendingoperations.go:267] Operation for \"\\\"kubernetes.io/glusterfs/c3751cf0-51c9-11e9-9d82-005056897a94-pvc-e62d65b3-ed76-11e8-98e8-00508-005056897a94: transport endpoint is not connected\"\n","stream":"stderr","time":"2019-04-02T06:08:09.839246077Z"}

What kind of request is this (question/bug/enhancement/feature request): question/bug

Steps to reproduce (least amount of steps as possible):

Result:

Other details that may be helpful:

Environment information

  • Rancher version (rancher/rancher/rancher/server image tag or shown bottom left in the UI): Rancher: v2.1.8 UI: v2.1.21
  • Installation option (single install/HA):
  • single

Cluster information

  • Cluster type (Hosted/Infrastructure Provider/Custom/Imported): Provider Custom v1.11.3-rancher1-1
  • Network Provider: Canal
  • Machine type (cloud/VM/metal) and specifications (CPU/memory): VM (CentOS 7.5.1804 3.10.0-862.14.4.el7.x86_64)
  • Kubernetes version (use kubectl version):
Server Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.3", GitCommit:"a4529464e4629c21224b3d52edfe0ea91b072862", GitTreeState:"clean", BuildDate:"2018-09-09T17:53:03Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}
  • Docker version (use docker version):
Client:
 Version:       17.12.1-ce
 API version:   1.35
 Go version:    go1.9.4
 Git commit:    7390fc6
 Built: Tue Feb 27 22:15:20 2018
 OS/Arch:       linux/amd64

Server:
 Engine:
  Version:      17.12.1-ce
  API version:  1.35 (minimum version 1.12)
  Go version:   go1.9.4
  Git commit:   7390fc6
  Built:        Tue Feb 27 22:17:54 2018
  OS/Arch:      linux/amd64
  Experimental: false

Cluster Yaml:

addon_job_timeout: 30
authentication: 
  strategy: "x509"
bastion_host: 
  ssh_agent_auth: false
ignore_docker_version: true
# 
#   # Currently only nginx ingress provider is supported.
#   # To disable ingress controller, set `provider: none`
#   # To enable ingress on specific nodes, use the node_selector, eg:
#      provider: nginx
#      node_selector:
#        app: ingress
# 
ingress: 
  provider: "none"
kubernetes_version: "v1.11.3-rancher1-1"
monitoring: 
  provider: "metrics-server"
# 
#   # If you are using calico on AWS
# 
#      network:
#        plugin: calico
#        calico_network_provider:
#          cloud_provider: aws
# 
#   # To specify flannel interface
# 
#      network:
#        plugin: flannel
#        flannel_network_provider:
#          iface: eth1
# 
#   # To specify flannel interface for canal plugin
# 
#      network:
#        plugin: canal
#        canal_network_provider:
#          iface: eth1
# 
network: 
  options: 
    flannel_backend_type: "vxlan"
  plugin: "canal"
# 
#      services:
#        kube_api:
#          service_cluster_ip_range: 10.43.0.0/16
#        kube_controller:
#          cluster_cidr: 10.42.0.0/16
#          service_cluster_ip_range: 10.43.0.0/16
#        kubelet:
#          cluster_domain: cluster.local
#          cluster_dns_server: 10.43.0.10
# 
services: 
  etcd: 
    creation: "12h"
    extra_args: 
      election-timeout: "5000"
      heartbeat-interval: "500"
    retention: "72h"
    snapshot: false
  kube-api: 
    pod_security_policy: false
    service_node_port_range: "30000-32767"
  kubelet: 
    fail_swap_on: false
ssh_agent_auth: false

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Reactions: 2
  • Comments: 23 (3 by maintainers)

Most upvoted comments

When the Kubelet is terminated/restarted, the fuse processes sharing the same cgroup as the container are killed, causing havoc for the mounted volumes:

Transport endpoint is not connected.

On systemd distros Kubernetes works around this by forking mount process in its own cgroup using systemd-run: https://github.com/kubernetes/kubernetes/blob/release-1.14/pkg/util/mount/mount_linux.go#L108-L135

But since the binary is not present in the containerised Kubelet’s PATH, this mechanism gets skipped, indicated by the following log entry in Kubelet container:

mount_linux.go:165] Detected OS without systemd

By adding an extra bind mount to the Kubelet configuration in the cluster YAML the problem should be fixed. E.g.:

services:
  kubelet:
    extra_binds:
      - "/usr/bin/systemd-run:/usr/bin/systemd-run"

https://rancher.com/docs/rancher/v2.x/en/cluster-admin/editing-clusters/#editing-cluster-as-yaml

This has not been resolved. Why close it? @deniseschannon

@adampl I had to configure the extra bind. I removed the bind and upgraded to 1.15.5 but the issue resurfaced once I restarted a kubelet. I then added the bind again and no longer experienced the issue after a kubelet restart.

As far as I can see, the bind + 1.15.5 fixed the issue. I’m still waiting for a go to upgrade our other clusters and see whether this sticks.