rancher: GlusterFS PV failed after kubelet restart

Hello,

we have created a Cluster with Rancher (infos below) and installed a GlusterFS Cluster with heketi described here (https://github.com/heketi/heketi/wiki/Kubernetes-Integration/832a65e365b4644a1c64ac47601893a3fdb52daf)

Three GluterFS Servers inside the same K8S Cluster
Heketi as Provisioner

When upgrading Rancher from 2.1.3 to 2.1.8 all kubelet processes are restarted, after the process restart pods beginning to fail because they could not read/write to their existing Gluster volume mounts.

It looks like when restarting kubelet all existing mounts are getting terminated, which is OK if a single kubelet process restart on one GlusterFS Node occurred, but when on all three GlusterFS Nodes the process is restarted in a short period of time (which happened on a rancher update/or cluster upgrade), all clients with a PVC will fail.

We could reproduce the issue when we tried to add an additional kubelet parameter to our cluster.yaml, which also forces a kubelet restart on all nodes.

The kubelet log show errors like this:

32067 kubelet.go:1616] Unable to mount volumes for pod \"XXXXXX(c3751cf0-51c9-11e9-9d82-005056897a94)\": timeout expired waiting for volumes to attach or mount for pod \"XXX\"/\"XXXX\". list of unmounted volumes=[volume]. list of unattached volumes=[volume default-token-qlvbv]; skipping pod\n","stream":"stderr","time":"2019-04-02T06:16:17.54248644Z"}

{"log":"E0402 06:08:09.838966   32067 nestedpendingoperations.go:267] Operation for \"\\\"kubernetes.io/glusterfs/c3751cf0-51c9-11e9-9d82-005056897a94-pvc-e62d65b3-ed76-11e8-98e8-00508-005056897a94: transport endpoint is not connected\"\n","stream":"stderr","time":"2019-04-02T06:08:09.839246077Z"}

What kind of request is this (question/bug/enhancement/feature request): question/bug

Steps to reproduce (least amount of steps as possible):

Result:

Other details that may be helpful:

Environment information

Rancher version (rancher/rancher/rancher/server image tag or shown bottom left in the UI): Rancher: v2.1.8 UI: v2.1.21
Installation option (single install/HA):
single

Cluster information

Cluster type (Hosted/Infrastructure Provider/Custom/Imported): Provider Custom v1.11.3-rancher1-1
Network Provider: Canal
Machine type (cloud/VM/metal) and specifications (CPU/memory): VM (CentOS 7.5.1804 3.10.0-862.14.4.el7.x86_64)
Kubernetes version (use kubectl version):

Server Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.3", GitCommit:"a4529464e4629c21224b3d52edfe0ea91b072862", GitTreeState:"clean", BuildDate:"2018-09-09T17:53:03Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}

Docker version (use docker version):

Client:
 Version:       17.12.1-ce
 API version:   1.35
 Go version:    go1.9.4
 Git commit:    7390fc6
 Built: Tue Feb 27 22:15:20 2018
 OS/Arch:       linux/amd64

Server:
 Engine:
  Version:      17.12.1-ce
  API version:  1.35 (minimum version 1.12)
  Go version:   go1.9.4
  Git commit:   7390fc6
  Built:        Tue Feb 27 22:17:54 2018
  OS/Arch:      linux/amd64
  Experimental: false

Cluster Yaml:

addon_job_timeout: 30
authentication: 
  strategy: "x509"
bastion_host: 
  ssh_agent_auth: false
ignore_docker_version: true
# 
#   # Currently only nginx ingress provider is supported.
#   # To disable ingress controller, set `provider: none`
#   # To enable ingress on specific nodes, use the node_selector, eg:
#      provider: nginx
#      node_selector:
#        app: ingress
# 
ingress: 
  provider: "none"
kubernetes_version: "v1.11.3-rancher1-1"
monitoring: 
  provider: "metrics-server"
# 
#   # If you are using calico on AWS
# 
#      network:
#        plugin: calico
#        calico_network_provider:
#          cloud_provider: aws
# 
#   # To specify flannel interface
# 
#      network:
#        plugin: flannel
#        flannel_network_provider:
#          iface: eth1
# 
#   # To specify flannel interface for canal plugin
# 
#      network:
#        plugin: canal
#        canal_network_provider:
#          iface: eth1
# 
network: 
  options: 
    flannel_backend_type: "vxlan"
  plugin: "canal"
# 
#      services:
#        kube_api:
#          service_cluster_ip_range: 10.43.0.0/16
#        kube_controller:
#          cluster_cidr: 10.42.0.0/16
#          service_cluster_ip_range: 10.43.0.0/16
#        kubelet:
#          cluster_domain: cluster.local
#          cluster_dns_server: 10.43.0.10
# 
services: 
  etcd: 
    creation: "12h"
    extra_args: 
      election-timeout: "5000"
      heartbeat-interval: "500"
    retention: "72h"
    snapshot: false
  kube-api: 
    pod_security_policy: false
    service_node_port_range: "30000-32767"
  kubelet: 
    fail_swap_on: false
ssh_agent_auth: false

About this issue

Original URL
State: closed
Created 5 years ago
Reactions: 2
Comments: 23 (3 by maintainers)

Most upvoted comments

When the Kubelet is terminated/restarted, the fuse processes sharing the same cgroup as the container are killed, causing havoc for the mounted volumes:

Transport endpoint is not connected.

On systemd distros Kubernetes works around this by forking mount process in its own cgroup using systemd-run: https://github.com/kubernetes/kubernetes/blob/release-1.14/pkg/util/mount/mount_linux.go#L108-L135

But since the binary is not present in the containerised Kubelet’s PATH, this mechanism gets skipped, indicated by the following log entry in Kubelet container:

mount_linux.go:165] Detected OS without systemd

By adding an extra bind mount to the Kubelet configuration in the cluster YAML the problem should be fixed. E.g.:

services:
  kubelet:
    extra_binds:
      - "/usr/bin/systemd-run:/usr/bin/systemd-run"

https://rancher.com/docs/rancher/v2.x/en/cluster-admin/editing-clusters/#editing-cluster-as-yaml

janeczku on Sep 12, 2019

This has not been resolved. Why close it? @deniseschannon

summed on Nov 20, 2019

@adampl I had to configure the extra bind. I removed the bind and upgraded to 1.15.5 but the issue resurfaced once I restarted a kubelet. I then added the bind again and no longer experienced the issue after a kubelet restart.

As far as I can see, the bind + 1.15.5 fixed the issue. I’m still waiting for a go to upgrade our other clusters and see whether this sticks.

dekimpew on Oct 25, 2019