longhorn: [BUG] MountVolume.SetUp failed for volume ~ rpc error: code = Internal desc = exit status 1 due to multipathd on the node

Environment:

  • Rancher 2.3.6
  • Longhorn 0.8.0
  • Flatcar Linux (CoreOS)
  • Kubernetes 1.16.8

Hi,

I have been able to get LH running and I see the volume has been successfully created on the underlying storage.

image

The pod with the volume is not starting.

image

longhorn-manager

time="2020-04-15T17:12:18Z" level=debug msg="Replica controller us2-k8swkr02 picked up pvc-d061512e-870a-4ece-bd45-2f04672d5256-r-1535f3bf"
time="2020-04-15T17:12:18Z" level=debug msg="Instance handler updated instance pvc-d061512e-870a-4ece-bd45-2f04672d5256-r-1535f3bf state, old state , new state stopped"
time="2020-04-15T17:12:18Z" level=debug msg="Replica controller us2-k8swkr02 picked up pvc-d061512e-870a-4ece-bd45-2f04672d5256-r-06f4526e"
time="2020-04-15T17:12:18Z" level=debug msg="Instance handler updated instance pvc-d061512e-870a-4ece-bd45-2f04672d5256-r-06f4526e state, old state , new state stopped"
time="2020-04-15T17:12:19Z" level=debug msg="Replica controller us2-k8swkr02 picked up pvc-d061512e-870a-4ece-bd45-2f04672d5256-r-b923d985"
time="2020-04-15T17:12:25Z" level=debug msg="Volume controller us2-k8swkr02 picked up pvc-d061512e-870a-4ece-bd45-2f04672d5256"
time="2020-04-15T17:12:25Z" level=debug msg="Prepare to create instance pvc-d061512e-870a-4ece-bd45-2f04672d5256-r-b923d985"
time="2020-04-15T17:12:25Z" level=info msg="Event(v1.ObjectReference{Kind:\"Replica\", Namespace:\"longhorn-system\", Name:\"pvc-d061512e-870a-4ece-bd45-2f04672d5256-r-b923d985\", UID:\"2fcd4749-4cc4-46f6-9964-2e7d9a0ccaf5\", APIVersion:\"longhorn.io/v1beta1\", ResourceVersion:\"2864440\", FieldPath:\"\"}): type: 'Normal' reason: 'Start' Starts pvc-d061512e-870a-4ece-bd45-2f04672d5256-r-b923d985"
time="2020-04-15T17:12:25Z" level=debug msg="Requeue longhorn-system/pvc-d061512e-870a-4ece-bd45-2f04672d5256 due to error <nil> or Operation cannot be fulfilled on replicas.longhorn.io \"pvc-d061512e-870a-4ece-bd45-2f04672d5256-r-06f4526e\": the object has been modified; please apply your changes to the latest version and try again"
time="2020-04-15T17:12:26Z" level=debug msg="Instance pvc-d061512e-870a-4ece-bd45-2f04672d5256-r-b923d985 starts running, IP 10.42.4.71"
time="2020-04-15T17:12:26Z" level=debug msg="Instance pvc-d061512e-870a-4ece-bd45-2f04672d5256-r-b923d985 starts running, Port 10000"
time="2020-04-15T17:12:26Z" level=debug msg="Instance handler updated instance pvc-d061512e-870a-4ece-bd45-2f04672d5256-r-b923d985 state, old state stopped, new state running"
time="2020-04-15T17:12:26Z" level=debug msg="Engine controller us2-k8swkr02 picked up pvc-d061512e-870a-4ece-bd45-2f04672d5256-e-bf525e87"
time="2020-04-15T17:12:26Z" level=debug msg="Instance process pvc-d061512e-870a-4ece-bd45-2f04672d5256-e-bf525e87 had been created, need to wait for instance manager update"
time="2020-04-15T17:12:26Z" level=debug msg="Instance process pvc-d061512e-870a-4ece-bd45-2f04672d5256-e-bf525e87 had been created, need to wait for instance manager update"
time="2020-04-15T17:12:27Z" level=debug msg="Instance process pvc-d061512e-870a-4ece-bd45-2f04672d5256-e-bf525e87 had been created, need to wait for instance manager update"
time="2020-04-15T17:12:27Z" level=debug msg="Instance handler updated instance pvc-d061512e-870a-4ece-bd45-2f04672d5256-e-bf525e87 state, old state stopped, new state starting"
time="2020-04-15T17:12:30Z" level=debug msg="Instance pvc-d061512e-870a-4ece-bd45-2f04672d5256-e-bf525e87 starts running, IP 10.42.4.70"
time="2020-04-15T17:12:30Z" level=debug msg="Instance pvc-d061512e-870a-4ece-bd45-2f04672d5256-e-bf525e87 starts running, Port 10000"
time="2020-04-15T17:12:30Z" level=debug msg="Instance handler updated instance pvc-d061512e-870a-4ece-bd45-2f04672d5256-e-bf525e87 state, old state starting, new state running"
time="2020-04-15T17:12:30Z" level=debug msg="Start monitoring pvc-d061512e-870a-4ece-bd45-2f04672d5256-e-bf525e87"
time="2020-04-15T17:12:30Z" level=info msg="Event(v1.ObjectReference{Kind:\"Volume\", Namespace:\"longhorn-system\", Name:\"pvc-d061512e-870a-4ece-bd45-2f04672d5256\", UID:\"fecc104d-8ec6-48c4-8af0-0ab25ec1ba90\", APIVersion:\"longhorn.io/v1beta1\", ResourceVersion:\"2864445\", FieldPath:\"\"}): type: 'Normal' reason: 'Attached' volume pvc-d061512e-870a-4ece-bd45-2f04672d5256 has been attached to us2-k8swkr02"

longhorn-driver

time="2020-04-15T13:40:25Z" level=debug msg="Deploying CSI driver"
time="2020-04-15T13:40:25Z" level=debug msg="proc cmdline detection pod discover-proc-kubelet-cmdline in phase: Pending"
time="2020-04-15T13:40:26Z" level=debug msg="proc cmdline detection pod discover-proc-kubelet-cmdline in phase: Pending"
time="2020-04-15T13:40:27Z" level=debug msg="proc cmdline detection pod discover-proc-kubelet-cmdline in phase: Running"
time="2020-04-15T13:40:28Z" level=debug msg="proc cmdline detection pod discover-proc-kubelet-cmdline in phase: Running"
time="2020-04-15T13:40:29Z" level=info msg="Proc found: kubelet"
time="2020-04-15T13:40:29Z" level=info msg="Try to find arg [--root-dir] in cmdline: [kubelet --cluster-dns=10.43.0.10 --hostname-override=us2-k8swkr01 --pod-infra-container-image=rancher/pause:3.1 --make-iptables-util-chains=true --cni-conf-dir=/etc/cni/net.d --authentication-token-webhook=true --resolv-conf=/etc/resolv.conf --cloud-provider= --cgroups-per-qos=True --v=2 --read-only-port=10255 --cni-bin-dir=/opt/cni/bin --root-dir=/var/lib/kubelet --kubeconfig=/etc/kubernetes/ssl/kubecfg-kube-node.yaml --authorization-mode=Webhook --address=0.0.0.0 --volume-plugin-dir=/var/lib/kubelet/volumeplugins --network-plugin=cni --client-ca-file=/etc/kubernetes/ssl/kube-ca.pem --fail-swap-on=false --streaming-connection-idle-timeout=30m --event-qps=0 --tls-cipher-suites=TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305 --anonymous-auth=false --cluster-domain=cluster.local --cgroup-driver=cgroupfs --resolv-conf=/run/systemd/resolve/resolv.conf ]"
time="2020-04-15T13:40:29Z" level=info msg="Detected root dir path: /var/lib/kubelet"
time="2020-04-15T13:40:29Z" level=info msg="Upgrading Longhorn related components for CSI v1.1.0"
time="2020-04-15T13:40:29Z" level=debug msg="Creating CSI Driver driver.longhorn.io"
time="2020-04-15T13:40:29Z" level=debug msg="Created CSI Driver driver.longhorn.io"
time="2020-04-15T13:40:29Z" level=debug msg="Creating service csi-attacher"
time="2020-04-15T13:40:29Z" level=debug msg="Created service csi-attacher"
time="2020-04-15T13:40:29Z" level=debug msg="Creating deployment csi-attacher"
time="2020-04-15T13:40:29Z" level=debug msg="Created deployment csi-attacher"
time="2020-04-15T13:40:29Z" level=debug msg="Creating service csi-provisioner"
time="2020-04-15T13:40:29Z" level=debug msg="Created service csi-provisioner"
time="2020-04-15T13:40:29Z" level=debug msg="Creating deployment csi-provisioner"
time="2020-04-15T13:40:29Z" level=debug msg="Created deployment csi-provisioner"
time="2020-04-15T13:40:30Z" level=debug msg="Creating service csi-resizer"
time="2020-04-15T13:40:30Z" level=debug msg="Created service csi-resizer"
time="2020-04-15T13:40:30Z" level=debug msg="Creating deployment csi-resizer"
time="2020-04-15T13:40:30Z" level=debug msg="Created deployment csi-resizer"
time="2020-04-15T13:40:30Z" level=debug msg="Creating daemon set longhorn-csi-plugin"
time="2020-04-15T13:40:30Z" level=debug msg="Created daemon set longhorn-csi-plugin"
time="2020-04-15T13:40:30Z" level=debug msg="Creating service compatible-csi-attacher"
time="2020-04-15T13:40:30Z" level=debug msg="Created service compatible-csi-attacher"
time="2020-04-15T13:40:30Z" level=debug msg="Creating deployment compatible-csi-attacher"
time="2020-04-15T13:40:30Z" level=debug msg="Created deployment compatible-csi-attacher"
time="2020-04-15T13:40:30Z" level=debug msg="CSI deployment done"

conpatibility-node-driver-registrar

I0415 13:40:32.718021 1 main.go:110] Version: v1.2.0-0-g6ef000ae
I0415 13:40:32.718114 1 main.go:120] Attempting to open a gRPC connection with: "/var/lib/kubelet/plugins/io.rancher.longhorn/csi.sock"
I0415 13:40:32.718146 1 connection.go:151] Connecting to unix:///var/lib/kubelet/plugins/io.rancher.longhorn/csi.sock
I0415 13:40:33.720229 1 main.go:127] Calling CSI driver to discover driver name
I0415 13:40:33.720281 1 connection.go:180] GRPC call: /csi.v1.Identity/GetPluginInfo
I0415 13:40:33.720297 1 connection.go:181] GRPC request: {}
I0415 13:40:33.725650 1 connection.go:183] GRPC response: {"name":"io.rancher.longhorn","vendor_version":"v0.8.0"}
I0415 13:40:33.726588 1 connection.go:184] GRPC error: <nil>
I0415 13:40:33.726608 1 main.go:137] CSI driver name: "io.rancher.longhorn"
I0415 13:40:33.726763 1 node_register.go:58] Starting Registration Server at: /registration/io.rancher.longhorn-reg.sock
I0415 13:40:33.727143 1 node_register.go:67] Registration Server started at: /registration/io.rancher.longhorn-reg.sock
I0415 13:41:22.090870 1 main.go:77] Received GetInfo call: &InfoRequest{}
I0415 13:41:22.706991 1 main.go:87] Received NotifyRegistrationStatus call: &RegistrationStatus{PluginRegistered:true,Error:,}

longhorn-csi-plugin

E0416 08:47:25.148934 1 mount_linux.go:487] format of disk "/dev/longhorn/pvc-d061512e-870a-4ece-bd45-2f04672d5256" failed: type:("ext4") target:("/var/lib/kubelet/pods/cf0a0b5b-106e-4793-a74a-28bfae21be1a/volumes/kubernetes.io~csi/pvc-d061512e-870a-4ece-bd45-2f04672d5256/mount") options:(["defaults"])error:(exit status 1)
time="2020-04-16T08:47:25Z" level=error msg="GRPC error: rpc error: code = Internal desc = exit status 1"
time="2020-04-16T08:49:27Z" level=info msg="GRPC call: /csi.v1.Node/NodeGetCapabilities"
time="2020-04-16T08:49:27Z" level=info msg="GRPC request: {}"
time="2020-04-16T08:49:27Z" level=info msg="GRPC response: {\"capabilities\":[{\"Type\":{\"Rpc\":{\"type\":3}}}]}"
time="2020-04-16T08:49:27Z" level=info msg="GRPC call: /csi.v1.Node/NodeGetCapabilities"
time="2020-04-16T08:49:27Z" level=info msg="GRPC request: {}"
time="2020-04-16T08:49:27Z" level=info msg="GRPC response: {\"capabilities\":[{\"Type\":{\"Rpc\":{\"type\":3}}}]}"
time="2020-04-16T08:49:27Z" level=info msg="GRPC call: /csi.v1.Node/NodePublishVolume"
time="2020-04-16T08:49:27Z" level=info msg="GRPC request: {\"target_path\":\"/var/lib/kubelet/pods/cf0a0b5b-106e-4793-a74a-28bfae21be1a/volumes/kubernetes.io~csi/pvc-d061512e-870a-4ece-bd45-2f04672d5256/mount\",\"volume_capability\":{\"AccessType\":{\"Mount\":{\"fs_type\":\"ext4\"}},\"access_mode\":{\"mode\":1}},\"volume_context\":{\"baseImage\":\"\",\"fromBackup\":\"\",\"numberOfReplicas\":\"3\",\"staleReplicaTimeout\":\"30\",\"storage.kubernetes.io/csiProvisionerIdentity\":\"1586958032802-8081-driver.longhorn.io\"},\"volume_id\":\"pvc-d061512e-870a-4ece-bd45-2f04672d5256\"}"
time="2020-04-16T08:49:27Z" level=info msg="NodeServer NodePublishVolume req: volume_id:\"pvc-d061512e-870a-4ece-bd45-2f04672d5256\" target_path:\"/var/lib/kubelet/pods/cf0a0b5b-106e-4793-a74a-28bfae21be1a/volumes/kubernetes.io~csi/pvc-d061512e-870a-4ece-bd45-2f04672d5256/mount\" volume_capability:<mount:<fs_type:\"ext4\" > access_mode:<mode:SINGLE_NODE_WRITER > > volume_context:<key:\"baseImage\" value:\"\" > volume_context:<key:\"fromBackup\" value:\"\" > volume_context:<key:\"numberOfReplicas\" value:\"3\" > volume_context:<key:\"staleReplicaTimeout\" value:\"30\" > volume_context:<key:\"storage.kubernetes.io/csiProvisionerIdentity\" value:\"1586958032802-8081-driver.longhorn.io\" > "
E0416 08:49:27.567704 1 mount_linux.go:143] Mount failed: exit status 32
Mounting command: mount
Mounting arguments: -t ext4 -o defaults /dev/longhorn/pvc-d061512e-870a-4ece-bd45-2f04672d5256 /var/lib/kubelet/pods/cf0a0b5b-106e-4793-a74a-28bfae21be1a/volumes/kubernetes.io~csi/pvc-d061512e-870a-4ece-bd45-2f04672d5256/mount
Output: mount: /var/lib/kubelet/pods/cf0a0b5b-106e-4793-a74a-28bfae21be1a/volumes/kubernetes.io~csi/pvc-d061512e-870a-4ece-bd45-2f04672d5256/mount: /dev/longhorn/pvc-d061512e-870a-4ece-bd45-2f04672d5256 already mounted or mount point busy.
E0416 08:49:27.576477 1 mount_linux.go:487] format of disk "/dev/longhorn/pvc-d061512e-870a-4ece-bd45-2f04672d5256" failed: type:("ext4") target:("/var/lib/kubelet/pods/cf0a0b5b-106e-4793-a74a-28bfae21be1a/volumes/kubernetes.io~csi/pvc-d061512e-870a-4ece-bd45-2f04672d5256/mount") options:(["defaults"])error:(exit status 1)
time="2020-04-16T08:49:27Z" level=error msg="GRPC error: rpc error: code = Internal desc = exit status 1"
time="2020-04-16T08:51:29Z" level=info msg="GRPC call: /csi.v1.Node/NodeGetCapabilities"
time="2020-04-16T08:51:29Z" level=info msg="GRPC request: {}"
time="2020-04-16T08:51:29Z" level=info msg="GRPC response: {\"capabilities\":[{\"Type\":{\"Rpc\":{\"type\":3}}}]}"
time="2020-04-16T08:51:29Z" level=info msg="GRPC call: /csi.v1.Node/NodeGetCapabilities"
time="2020-04-16T08:51:29Z" level=info msg="GRPC request: {}"
time="2020-04-16T08:51:29Z" level=info msg="GRPC response: {\"capabilities\":[{\"Type\":{\"Rpc\":{\"type\":3}}}]}"
time="2020-04-16T08:51:29Z" level=info msg="GRPC call: /csi.v1.Node/NodePublishVolume"
time="2020-04-16T08:51:29Z" level=info msg="GRPC request: {\"target_path\":\"/var/lib/kubelet/pods/cf0a0b5b-106e-4793-a74a-28bfae21be1a/volumes/kubernetes.io~csi/pvc-d061512e-870a-4ece-bd45-2f04672d5256/mount\",\"volume_capability\":{\"AccessType\":{\"Mount\":{\"fs_type\":\"ext4\"}},\"access_mode\":{\"mode\":1}},\"volume_context\":{\"baseImage\":\"\",\"fromBackup\":\"\",\"numberOfReplicas\":\"3\",\"staleReplicaTimeout\":\"30\",\"storage.kubernetes.io/csiProvisionerIdentity\":\"1586958032802-8081-driver.longhorn.io\"},\"volume_id\":\"pvc-d061512e-870a-4ece-bd45-2f04672d5256\"}"
time="2020-04-16T08:51:29Z" level=info msg="NodeServer NodePublishVolume req: volume_id:\"pvc-d061512e-870a-4ece-bd45-2f04672d5256\" target_path:\"/var/lib/kubelet/pods/cf0a0b5b-106e-4793-a74a-28bfae21be1a/volumes/kubernetes.io~csi/pvc-d061512e-870a-4ece-bd45-2f04672d5256/mount\" volume_capability:<mount:<fs_type:\"ext4\" > access_mode:<mode:SINGLE_NODE_WRITER > > volume_context:<key:\"baseImage\" value:\"\" > volume_context:<key:\"fromBackup\" value:\"\" > volume_context:<key:\"numberOfReplicas\" value:\"3\" > volume_context:<key:\"staleReplicaTimeout\" value:\"30\" > volume_context:<key:\"storage.kubernetes.io/csiProvisionerIdentity\" value:\"1586958032802-8081-driver.longhorn.io\" > "
E0416 08:51:30.015597 1 mount_linux.go:143] Mount failed: exit status 32
Mounting command: mount
Mounting arguments: -t ext4 -o defaults /dev/longhorn/pvc-d061512e-870a-4ece-bd45-2f04672d5256 /var/lib/kubelet/pods/cf0a0b5b-106e-4793-a74a-28bfae21be1a/volumes/kubernetes.io~csi/pvc-d061512e-870a-4ece-bd45-2f04672d5256/mount
Output: mount: /var/lib/kubelet/pods/cf0a0b5b-106e-4793-a74a-28bfae21be1a/volumes/kubernetes.io~csi/pvc-d061512e-870a-4ece-bd45-2f04672d5256/mount: /dev/longhorn/pvc-d061512e-870a-4ece-bd45-2f04672d5256 already mounted or mount point busy.
E0416 08:51:30.024131 1 mount_linux.go:487] format of disk "/dev/longhorn/pvc-d061512e-870a-4ece-bd45-2f04672d5256" failed: type:("ext4") target:("/var/lib/kubelet/pods/cf0a0b5b-106e-4793-a74a-28bfae21be1a/volumes/kubernetes.io~csi/pvc-d061512e-870a-4ece-bd45-2f04672d5256/mount") options:(["defaults"])error:(exit status 1)
time="2020-04-16T08:51:30Z" level=error msg="GRPC error: rpc error: code = Internal desc = exit status 1"

Expected behavior: Pod should mount volume and startup.

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 2
  • Comments: 38 (14 by maintainers)

Most upvoted comments

To prevent multipath daemon from adding additional block devices created by Longhorn

First check devices created by Longhorn using lsblk

root@localhost:~# lsblk 
NAME MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda    8:0    0 79.5G  0 disk /
sdb    8:16   0  512M  0 disk [SWAP]
sdc    8:32   0    1G  0 disk /var/lib/kubelet/pods/c2c2b848-1f40-4727-8a52-03a74f9c76b9/volumes/kubernetes.io~csi/pvc-859bc3c9-faa8-4f54-85e4-b12935b5ae3c/mount
sdd    8:48   0    1G  0 disk /var/lib/kubelet/pods/063a181a-66ac-4644-8268-9215305f9b73/volumes/kubernetes.io~csi/pvc-837eb6ac-45fe-4de7-9c97-8d371eb02190/mount
sde    8:64   0    1G  0 disk /var/lib/kubelet/pods/4c80842d-7257-4b91-b668-bb5b111da003/volumes/kubernetes.io~csi/pvc-c01cee3e-f292-4979-b183-6546d6397fbd/mount
sdf    8:80   0    1G  0 disk /var/lib/kubelet/pods/052dadd9-042a-451c-9bb1-2d9418f0381f/volumes/kubernetes.io~csi/pvc-ba7a5c9a-d84d-4cb0-959d-3db39f34d81b/mount
sdg    8:96   0    1G  0 disk /var/lib/kubelet/pods/7399b073-c262-4963-8c7f-9e481272ea36/volumes/kubernetes.io~csi/pvc-2b122b42-141a-4181-b8fd-ce3cf91f6a64/mount
sdh    8:112  0    1G  0 disk /var/lib/kubelet/pods/a63d919d-201b-4eb1-9d84-6440926211a9/volumes/kubernetes.io~csi/pvc-b7731785-8364-42a8-9e7d-7516801ab7e0/mount
sdi    8:128  0    1G  0 disk /var/lib/kubelet/pods/3e056ee4-bab4-4230-9054-ab214bdf711f/volumes/kubernetes.io~csi/pvc-89d37a02-8480-4317-b0f1-f17b2a886d1d/mount
root@localhost:~# 

Notice that Longhorn device names start with /dev/sd[x]

  • Create the default configuration file /etc/multipath.conf if not existed
  • Add the following line to blacklist section devnode "^sd[a-z0-9]+"
blacklist {
    devnode "^sd[a-z0-9]+"
}
  • Restart multipath service # systemctl restart multipathd.service

  • Verify that configuration is applied # multipath -t

The default configurations for multipath blacklist section is preventing the following device names by default ^(ram|raw|loop|fd|md|dm-|sr|scd|st|dcssblk)[0-9] ^(td|hd|vd)[a-z]

Ref: http://manpages.ubuntu.com/manpages/bionic/man5/multipath.conf.5.html#blacklist section

Just posting so that it may help someone who stumbles across this as well.

I also experienced a similar issue with Longhorn and iscsid. After rebuilding my k8s cluster from scratch and installing Longhorn, I was attempting to restore a PVC from a Minio/S3 backup. My backup would restore, but mounting the PV into the pod resulted in (kubectl describe pod my-pod-12345-123456):

mount: /var/lib/kubelet/pods/22696c7f-277b-4fef-b587-f8099810b8fe/volumes/kubernetes.io~csi/pvc-6b821895-0b91-465b-8cac-4297c8d80019/mount: /dev/longhorn/pvc-6b821895-0b91-465b-8cac-4297c8d80019 already mounted or mount point busy

So after reading this thread I realized that my LVM was mounted to /var/lib/longhorn using the /dev/disk/by-uuid/ method, like so (/etc/fstab):

/dev/disk/by-uuid/c3aec7da-1442-40cf-a3fd-884adr3d972a /var/lib/longhorn ext4 defaults 0 0

Since this was an LVM I instead decided to just mount the disk using the logical volume name, converting the previous line in /etc/fstab to one like this:

/dev/longhorn/disk /var/lib/longhorn ext4 defaults 0 0

As far as I can tell, this has resolved my issue of not being able to mount restored PV/PVCs into pods.

Just encountered this issue (after encountering UnmountVolume.NewUnmounter failed for volume and vol_data.json: no such file or directory and rebooting the node) with Longhorn 1.5.1. Adding a blacklist section in the multipath config seems to have worked, but what the hell is this and why do I need to do any of that? Any way to protect against that behavior?

I have no clue what multipath is, what problem its trying to solve, or why it causes this issue, but @meldafrawi’s solution fixed the problem for me nearly instantly. Thank you!

@ryantiger658 @mogwai There were two reasons we saw for this issue. The first one happens with AWS mapping mechanism and has been fixed in v1.0.0. #1223

The other one is there is a local process pick up the block device created by Longhorn automatically. Normally what we saw is multipath process. To find out the culprit:

  1. Find the major:minor number of the Longhorn device. On the node, try ls -l /dev/longhorn/. The major:minor number will be shown as e.g. 8, 32 before the device name.
  2. Find what’s the device generated by Linux for the same major:minor number. Use ls -l /dev and find the device for the same major:minor number, e.g. /dev/sde.
  3. Find the process. Use lsof to get the list of file handler in use, then grep for the device name (e.g. sde or /dev/longhorn/xxx. You should find one there.

Then you can figure out what to do next.

For the multipath process, you can use e.g. multipath -v2 -d to see if the Longhorn device was picked up automatically.

If it’s not the multipath process, lsof should have the related info.

I think Longhorn should alert in the UI if it detects this issue

How do I destroy it 😃?

I tried killing the process but it just restarts.

I also tried removing the multipath for that device using the following:

multipath -l
mpatha (360000000000000000e00000000010001) dm-0 IET,VIRTUAL-DISK
size=2.0G features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=0 status=active
  `- 4:0:0:1 sdb 8:16 active undef running
multipath -f mpatha

This worked and the container is now running 💪 but multipath picks it up automatically when a new pvc is created.

@crutonjohn Thanks for the information. Though I am a bit confused about why it’s happened in your case. Btw, I don’t think /dev/longhorn/disk is a Longhorn volume right? It should be the LVM device provided to Longhorn IIUC.

correct, /dev/longhorn/disk is an LVM. i’m not really sure either, idk if it’s a different bug or what but the symptoms were sorta the same. i came up with the solution with the amalgamation of info here and just thought i’d share.

okay, i had to rebuild my cluster again tonight for reasons and it looks like this is happening again.

i’m gonna have to try the multipath fix tomorrow and see if that fixes it. sorry for the thread spam.

@meldafrawi 's suggestion worked: https://github.com/longhorn/longhorn/issues/1210#issuecomment-671689746

To prevent multipath daemon from adding additional block devices created by Longhorn

First check devices created by Longhorn using lsblk

root@localhost:~# lsblk 
NAME MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda    8:0    0 79.5G  0 disk /
sdb    8:16   0  512M  0 disk [SWAP]
sdc    8:32   0    1G  0 disk /var/lib/kubelet/pods/c2c2b848-1f40-4727-8a52-03a74f9c76b9/volumes/kubernetes.io~csi/pvc-859bc3c9-faa8-4f54-85e4-b12935b5ae3c/mount
sdd    8:48   0    1G  0 disk /var/lib/kubelet/pods/063a181a-66ac-4644-8268-9215305f9b73/volumes/kubernetes.io~csi/pvc-837eb6ac-45fe-4de7-9c97-8d371eb02190/mount
sde    8:64   0    1G  0 disk /var/lib/kubelet/pods/4c80842d-7257-4b91-b668-bb5b111da003/volumes/kubernetes.io~csi/pvc-c01cee3e-f292-4979-b183-6546d6397fbd/mount
sdf    8:80   0    1G  0 disk /var/lib/kubelet/pods/052dadd9-042a-451c-9bb1-2d9418f0381f/volumes/kubernetes.io~csi/pvc-ba7a5c9a-d84d-4cb0-959d-3db39f34d81b/mount
sdg    8:96   0    1G  0 disk /var/lib/kubelet/pods/7399b073-c262-4963-8c7f-9e481272ea36/volumes/kubernetes.io~csi/pvc-2b122b42-141a-4181-b8fd-ce3cf91f6a64/mount
sdh    8:112  0    1G  0 disk /var/lib/kubelet/pods/a63d919d-201b-4eb1-9d84-6440926211a9/volumes/kubernetes.io~csi/pvc-b7731785-8364-42a8-9e7d-7516801ab7e0/mount
sdi    8:128  0    1G  0 disk /var/lib/kubelet/pods/3e056ee4-bab4-4230-9054-ab214bdf711f/volumes/kubernetes.io~csi/pvc-89d37a02-8480-4317-b0f1-f17b2a886d1d/mount
root@localhost:~# 

Notice that Longhorn device names start with /dev/sd[x]

  • Create the default configuration file /etc/multipath.conf if not existed
  • Add the following line to blacklist section devnode "^sd[a-z0-9]+"
blacklist {
    devnode "^sd[a-z0-9]+"
}
  • Restart multipath service # systemctl restart multipathd.service
  • Verify that configuration is applied # multipath -t

The default configurations for multipath blacklist section is preventing the following device names by default ^(ram|raw|loop|fd|md|dm-|sr|scd|st|dcssblk)[0-9] ^(td|hd|vd)[a-z]

Ref: http://manpages.ubuntu.com/manpages/bionic/man5/multipath.conf.5.html#blacklist section

@meldafrawi Your solution works, and is there any plan to perform this automatically by longhorn in the future?

Yeah, your multipath process hijacked the device. 😃

  1. Get the device major and minor:
ls -l /dev/longhorn
brw-rw---- 1 root root 8, 32 Aug 10 21:50 pvc-39c0db31-628d-41d0-b05a-4568ec02e487
  1. I use the 8, 32 to find the device:
brw-rw---- 1 root disk      8,  32 Aug 10 21:50 sdc
  1. lsof to find the PID
lsof | grep sdc
multipath 514292                              root    8r      BLK               8,32        0t0        534 /dev/sdc
multipath 514292 514297 multipath             root    8r      BLK               8,32        0t0        534 /dev/sdc
multipath 514292 514299 multipath             root    8r      BLK               8,32        0t0        534 /dev/sdc
multipath 514292 514300 multipath             root    8r      BLK               8,32        0t0        534 /dev/sdc
multipath 514292 514301 multipath             root    8r      BLK               8,32        0t0        534 /dev/sdc
multipath 514292 514302 multipath             root    8r      BLK               8,32        0t0        534 /dev/sdc
multipath 514292 514304 multipath             root    8r      BLK               8,32        0t0        534 /dev/sdc
  1. My longhorn-manager logs:
ime="2020-08-10T21:58:07Z" level=debug msg="Created volume pvc-98cfa312-0d5b-47e5-a196-435398f96da3: {Size:2147483648 Frontend:blockdev FromBackup: NumberOfReplicas:3 StaleReplicaTimeout:30 NodeID: EngineImage:longhornio/longhorn-engine:v1.0.1 RecurringJobs:[] BaseImage: Standby:false DiskSelector:[] NodeSelector:[] DisableFrontend:false}"
10.1.72.37 - - [10/Aug/2020:21:58:07 +0000] "POST /v1/volumes HTTP/1.1" 200 972 "" "Go-http-client/1.1"
time="2020-08-10T21:58:07Z" level=debug msg="Volume controller kmaster picked up pvc-98cfa312-0d5b-47e5-a196-435398f96da3"
time="2020-08-10T21:58:07Z" level=debug msg="Engine controller kmaster picked up pvc-98cfa312-0d5b-47e5-a196-435398f96da3-e-4782e983"
time="2020-08-10T21:58:07Z" level=debug msg="Instance handler updated instance pvc-98cfa312-0d5b-47e5-a196-435398f96da3-e-4782e983 state, old state , new state stopped"
time="2020-08-10T21:58:07Z" level=debug msg="Replica controller kmaster picked up pvc-98cfa312-0d5b-47e5-a196-435398f96da3-r-4f14110c"
time="2020-08-10T21:58:07Z" level=debug msg="Instance handler updated instance pvc-98cfa312-0d5b-47e5-a196-435398f96da3-r-4f14110c state, old state , new state stopped"
time="2020-08-10T21:58:07Z" level=debug msg="Schedule replica pvc-98cfa312-0d5b-47e5-a196-435398f96da3-r-4f14110c to node kmaster, disk default-disk-f25f5e0ca885e875, datapath /var/lib/longhorn/replicas/pvc-98cfa312-0d5b-47e5-a196-435398f96da3-6e95ced3"
time="2020-08-10T21:58:07Z" level=debug msg="Schedule replica pvc-98cfa312-0d5b-47e5-a196-435398f96da3-r-d12e7e58 to node kmaster, disk default-disk-f25f5e0ca885e875, datapath /var/lib/longhorn/replicas/pvc-98cfa312-0d5b-47e5-a196-435398f96da3-741589fd"
time="2020-08-10T21:58:07Z" level=debug msg="Schedule replica pvc-98cfa312-0d5b-47e5-a196-435398f96da3-r-095eba9a to node kmaster, disk default-disk-f25f5e0ca885e875, datapath /var/lib/longhorn/replicas/pvc-98cfa312-0d5b-47e5-a196-435398f96da3-dd83e4cc"
time="2020-08-10T21:58:07Z" level=debug msg="Replica controller kmaster picked up pvc-98cfa312-0d5b-47e5-a196-435398f96da3-r-d12e7e58"
time="2020-08-10T21:58:07Z" level=debug msg="Instance handler updated instance pvc-98cfa312-0d5b-47e5-a196-435398f96da3-r-d12e7e58 state, old state , new state stopped"
time="2020-08-10T21:58:07Z" level=debug msg="Replica controller kmaster picked up pvc-98cfa312-0d5b-47e5-a196-435398f96da3-r-095eba9a"
time="2020-08-10T21:58:07Z" level=debug msg="Instance handler updated instance pvc-98cfa312-0d5b-47e5-a196-435398f96da3-r-095eba9a state, old state , new state stopped"
time="2020-08-10T21:58:08Z" level=debug msg="Requeue longhorn-system/pvc-98cfa312-0d5b-47e5-a196-435398f96da3 due to error <nil> or Operation cannot be fulfilled on replicas.longhorn.io \"pvc-98cfa312-0d5b-47e5-a196-435398f96da3-r-095eba9a\": the object has been modified; please apply your changes to the latest version and try again"
time="2020-08-10T21:58:08Z" level=debug msg="Schedule replica pvc-98cfa312-0d5b-47e5-a196-435398f96da3-r-d12e7e58 to node kmaster, disk default-disk-f25f5e0ca885e875, datapath /var/lib/longhorn/replicas/pvc-98cfa312-0d5b-47e5-a196-435398f96da3-4be5c245"
time="2020-08-10T21:58:08Z" level=debug msg="Schedule replica pvc-98cfa312-0d5b-47e5-a196-435398f96da3-r-095eba9a to node kmaster, disk default-disk-f25f5e0ca885e875, datapath /var/lib/longhorn/replicas/pvc-98cfa312-0d5b-47e5-a196-435398f96da3-4882c3f4"
time="2020-08-10T21:58:08Z" level=info msg="Event(v1.ObjectReference{Kind:\"Volume\", Namespace:\"longhorn-system\", Name:\"pvc-98cfa312-0d5b-47e5-a196-435398f96da3\", UID:\"d8b269b0-6449-49e8-9dab-1965f672dd94\", APIVersion:\"longhorn.io/v1beta1\", ResourceVersion:\"199323\", FieldPath:\"\"}): type: 'Normal' reason: 'Detached' volume pvc-98cfa312-0d5b-47e5-a196-435398f96da3 has been detached"
10.1.72.37 - - [10/Aug/2020:21:58:09 +0000] "GET /v1/volumes/pvc-98cfa312-0d5b-47e5-a196-435398f96da3 HTTP/1.1" 200 2280 "" "Go-http-client/1.1"
10.1.72.37 - - [10/Aug/2020:21:58:11 +0000] "GET /v1/volumes/pvc-98cfa312-0d5b-47e5-a196-435398f96da3 HTTP/1.1" 200 3738 "" "Go-http-client/1.1"
time="2020-08-10T21:58:11Z" level=info msg="Event(v1.ObjectReference{Kind:\"Volume\", Namespace:\"longhorn-system\", Name:\"pvc-98cfa312-0d5b-47e5-a196-435398f96da3\", UID:\"d8b269b0-6449-49e8-9dab-1965f672dd94\", APIVersion:\"longhorn.io/v1beta1\", ResourceVersion:\"199361\", FieldPath:\"\"}): type: 'Normal' reason: 'Start' Persistent Volume pvc-98cfa312-0d5b-47e5-a196-435398f96da3 started to use/reuse Longhorn volume pvc-98cfa312-0d5b-47e5-a196-435398f96da3"
10.1.72.37 - - [10/Aug/2020:21:58:15 +0000] "GET /v1/volumes/pvc-98cfa312-0d5b-47e5-a196-435398f96da3 HTTP/1.1" 200 3918 "" "Go-http-client/1.1"
10.1.72.37 - - [10/Aug/2020:21:58:15 +0000] "GET /v1/nodes/kmaster HTTP/1.1" 200 1729 "" "Go-http-client/1.1"
time="2020-08-10T21:58:15Z" level=debug msg="Prepare to create instance pvc-98cfa312-0d5b-47e5-a196-435398f96da3-r-4f14110c"
time="2020-08-10T21:58:15Z" level=debug msg="Attaching volume pvc-98cfa312-0d5b-47e5-a196-435398f96da3 to kmaster with disableFrontend set false"
10.1.72.37 - - [10/Aug/2020:21:58:15 +0000] "POST /v1/volumes/pvc-98cfa312-0d5b-47e5-a196-435398f96da3?action=attach HTTP/1.1" 200 2500 "" "Go-http-client/1.1"
time="2020-08-10T21:58:15Z" level=info msg="Event(v1.ObjectReference{Kind:\"Replica\", Namespace:\"longhorn-system\", Name:\"pvc-98cfa312-0d5b-47e5-a196-435398f96da3-r-4f14110c\", UID:\"7fba87ea-1a3b-47e0-b713-bf8a065c303d\", APIVersion:\"longhorn.io/v1beta1\", ResourceVersion:\"199405\", FieldPath:\"\"}): type: 'Normal' reason: 'Start' Starts pvc-98cfa312-0d5b-47e5-a196-435398f96da3-r-4f14110c"
time="2020-08-10T21:58:15Z" level=debug msg="Prepare to create instance pvc-98cfa312-0d5b-47e5-a196-435398f96da3-r-d12e7e58"
time="2020-08-10T21:58:15Z" level=info msg="Event(v1.ObjectReference{Kind:\"Replica\", Namespace:\"longhorn-system\", Name:\"pvc-98cfa312-0d5b-47e5-a196-435398f96da3-r-d12e7e58\", UID:\"a146e9c6-4ff0-436c-b55b-f93645c0e1b2\", APIVersion:\"longhorn.io/v1beta1\", ResourceVersion:\"199407\", FieldPath:\"\"}): type: 'Normal' reason: 'Start' Starts pvc-98cfa312-0d5b-47e5-a196-435398f96da3-r-d12e7e58"
time="2020-08-10T21:58:16Z" level=debug msg="Prepare to create instance pvc-98cfa312-0d5b-47e5-a196-435398f96da3-r-095eba9a"
time="2020-08-10T21:58:16Z" level=info msg="Event(v1.ObjectReference{Kind:\"Replica\", Namespace:\"longhorn-system\", Name:\"pvc-98cfa312-0d5b-47e5-a196-435398f96da3-r-095eba9a\", UID:\"adc69756-d3d7-43d2-be7f-d438b65de987\", APIVersion:\"longhorn.io/v1beta1\", ResourceVersion:\"199412\", FieldPath:\"\"}): type: 'Normal' reason: 'Start' Starts pvc-98cfa312-0d5b-47e5-a196-435398f96da3-r-095eba9a"
time="2020-08-10T21:58:16Z" level=debug msg="Instance process pvc-98cfa312-0d5b-47e5-a196-435398f96da3-r-d12e7e58 had been created, need to wait for instance manager update"
time="2020-08-10T21:58:16Z" level=debug msg="Instance handler updated instance pvc-98cfa312-0d5b-47e5-a196-435398f96da3-r-d12e7e58 state, old state stopped, new state starting"
time="2020-08-10T21:58:16Z" level=debug msg="Instance process pvc-98cfa312-0d5b-47e5-a196-435398f96da3-r-4f14110c had been created, need to wait for instance manager update"
time="2020-08-10T21:58:16Z" level=debug msg="Instance handler updated instance pvc-98cfa312-0d5b-47e5-a196-435398f96da3-r-4f14110c state, old state stopped, new state starting"
time="2020-08-10T21:58:16Z" level=debug msg="Instance process pvc-98cfa312-0d5b-47e5-a196-435398f96da3-r-095eba9a had been created, need to wait for instance manager update"
time="2020-08-10T21:58:16Z" level=debug msg="Instance handler updated instance pvc-98cfa312-0d5b-47e5-a196-435398f96da3-r-095eba9a state, old state stopped, new state starting"
time="2020-08-10T21:58:17Z" level=debug msg="Instance pvc-98cfa312-0d5b-47e5-a196-435398f96da3-r-4f14110c starts running, IP 10.1.72.54"
time="2020-08-10T21:58:17Z" level=debug msg="Instance pvc-98cfa312-0d5b-47e5-a196-435398f96da3-r-095eba9a starts running, IP 10.1.72.54"
time="2020-08-10T21:58:17Z" level=debug msg="Instance pvc-98cfa312-0d5b-47e5-a196-435398f96da3-r-095eba9a starts running, Port 10030"
time="2020-08-10T21:58:17Z" level=debug msg="Instance handler updated instance pvc-98cfa312-0d5b-47e5-a196-435398f96da3-r-095eba9a state, old state starting, new state running"
time="2020-08-10T21:58:17Z" level=debug msg="Instance pvc-98cfa312-0d5b-47e5-a196-435398f96da3-r-4f14110c starts running, Port 10000"
time="2020-08-10T21:58:17Z" level=debug msg="Instance handler updated instance pvc-98cfa312-0d5b-47e5-a196-435398f96da3-r-4f14110c state, old state starting, new state running"
time="2020-08-10T21:58:17Z" level=debug msg="Instance pvc-98cfa312-0d5b-47e5-a196-435398f96da3-r-d12e7e58 starts running, IP 10.1.72.54"
time="2020-08-10T21:58:17Z" level=debug msg="Instance pvc-98cfa312-0d5b-47e5-a196-435398f96da3-r-d12e7e58 starts running, Port 10015"
time="2020-08-10T21:58:17Z" level=debug msg="Instance handler updated instance pvc-98cfa312-0d5b-47e5-a196-435398f96da3-r-d12e7e58 state, old state starting, new state running"
time="2020-08-10T21:58:17Z" level=debug msg="Prepare to create instance pvc-98cfa312-0d5b-47e5-a196-435398f96da3-e-4782e983"
time="2020-08-10T21:58:17Z" level=info msg="Event(v1.ObjectReference{Kind:\"Engine\", Namespace:\"longhorn-system\", Name:\"pvc-98cfa312-0d5b-47e5-a196-435398f96da3-e-4782e983\", UID:\"9dd2009a-3d5d-47bc-91c8-fdcb3e98cf47\", APIVersion:\"longhorn.io/v1beta1\", ResourceVersion:\"199431\", FieldPath:\"\"}): type: 'Normal' reason: 'Start' Starts pvc-98cfa312-0d5b-47e5-a196-435398f96da3-e-4782e983"
time="2020-08-10T21:58:17Z" level=debug msg="Instance process pvc-98cfa312-0d5b-47e5-a196-435398f96da3-e-4782e983 had been created, need to wait for instance manager update"
10.1.72.37 - - [10/Aug/2020:21:58:17 +0000] "GET /v1/volumes/pvc-98cfa312-0d5b-47e5-a196-435398f96da3 HTTP/1.1" 200 3333 "" "Go-http-client/1.1"
time="2020-08-10T21:58:18Z" level=debug msg="Instance process pvc-98cfa312-0d5b-47e5-a196-435398f96da3-e-4782e983 had been created, need to wait for instance manager update"
time="2020-08-10T21:58:18Z" level=debug msg="Instance handler updated instance pvc-98cfa312-0d5b-47e5-a196-435398f96da3-e-4782e983 state, old state stopped, new state starting"
10.1.72.37 - - [10/Aug/2020:21:58:19 +0000] "GET /v1/volumes/pvc-98cfa312-0d5b-47e5-a196-435398f96da3 HTTP/1.1" 200 3360 "" "Go-http-client/1.1"
time="2020-08-10T21:58:21Z" level=debug msg="Instance pvc-98cfa312-0d5b-47e5-a196-435398f96da3-e-4782e983 starts running, IP 10.1.72.53"
time="2020-08-10T21:58:21Z" level=debug msg="Instance pvc-98cfa312-0d5b-47e5-a196-435398f96da3-e-4782e983 starts running, Port 10000"
time="2020-08-10T21:58:21Z" level=debug msg="Instance handler updated instance pvc-98cfa312-0d5b-47e5-a196-435398f96da3-e-4782e983 state, old state starting, new state running"
time="2020-08-10T21:58:21Z" level=debug msg="Start monitoring pvc-98cfa312-0d5b-47e5-a196-435398f96da3-e-4782e983"
time="2020-08-10T21:58:21Z" level=info msg="Event(v1.ObjectReference{Kind:\"Volume\", Namespace:\"longhorn-system\", Name:\"pvc-98cfa312-0d5b-47e5-a196-435398f96da3\", UID:\"d8b269b0-6449-49e8-9dab-1965f672dd94\", APIVersion:\"longhorn.io/v1beta1\", ResourceVersion:\"199415\", FieldPath:\"\"}): type: 'Normal' reason: 'Attached' volume pvc-98cfa312-0d5b-47e5-a196-435398f96da3 has been attached to kmaster"
10.1.72.37 - - [10/Aug/2020:21:58:21 +0000] "GET /v1/volumes/pvc-98cfa312-0d5b-47e5-a196-435398f96da3 HTTP/1.1" 200 5069 "" "Go-http-client/1.1"
10.1.72.37 - - [10/Aug/2020:21:58:23 +0000] "GET /v1/volumes/pvc-98cfa312-0d5b-47e5-a196-435398f96da3 HTTP/1.1" 200 5069 "" "Go-http-client/1.1"
10.1.72.37 - - [10/Aug/2020:21:58:24 +0000] "GET /v1/volumes/pvc-98cfa312-0d5b-47e5-a196-435398f96da3 HTTP/1.1" 200 5069 "" "Go-http-client/1.1"
10.1.72.37 - - [10/Aug/2020:21:58:25 +0000] "GET /v1/volumes/pvc-98cfa312-0d5b-47e5-a196-435398f96da3 HTTP/1.1" 200 5069 "" "Go-http-client/1.1"
10.1.72.37 - - [10/Aug/2020:21:58:27 +0000] "GET /v1/volumes/pvc-98cfa312-0d5b-47e5-a196-435398f96da3 HTTP/1.1" 200 6163 "" "Go-http-client/1.1"
10.1.72.37 - - [10/Aug/2020:21:58:27 +0000] "GET /v1/settings/mkfs-ext4-parameters HTTP/1.1" 200 590 "" "Go-http-client/1.1"
10.1.72.37 - - [10/Aug/2020:21:58:32 +0000] "GET /v1/volumes/pvc-98cfa312-0d5b-47e5-a196-435398f96da3 HTTP/1.1" 200 6163 "" "Go-http-client/1.1"
10.1.72.37 - - [10/Aug/2020:21:58:32 +0000] "GET /v1/settings/mkfs-ext4-parameters HTTP/1.1" 200 590 "" "Go-http-client/1.1"
10.1.72.37 - - [10/Aug/2020:21:58:40 +0000] "GET /v1/volumes/pvc-98cfa312-0d5b-47e5-a196-435398f96da3 HTTP/1.1" 200 6163 "" "Go-http-client/1.1"
10.1.72.37 - - [10/Aug/2020:21:58:40 +0000] "GET /v1/settings/mkfs-ext4-parameters HTTP/1.1" 200 590 "" "Go-http-client/1.1"