longhorn: [DOC] longhorn-csi-plugin stuck in CrashLoopBackOff after system crash (SELinux related)

Describe the bug (🐛 if you encounter this issue)

I just had one of my system crash. After it came back up, the longhorn-csi-plugin ends up in a CrashLoopBackOff. Logs do not seem to hint at any particular reason.

To Reproduce

n/a

Expected behavior

longhorn-csi-plugin starts properly-

Log or Support bundle

longhorn-csi-plugin time="2023-02-03T15:10:56Z" level=info msg="CSI Driver: driver.longhorn.io version: v1.4.0, manager URL http://longhorn-backend:9500/v1"
longhorn-csi-plugin time="2023-02-03T15:10:56Z" level=info msg="Enabling node service capability: GET_VOLUME_STATS"
longhorn-csi-plugin time="2023-02-03T15:10:56Z" level=info msg="Enabling node service capability: STAGE_UNSTAGE_VOLUME"
longhorn-csi-plugin time="2023-02-03T15:10:56Z" level=info msg="Enabling node service capability: EXPAND_VOLUME"
longhorn-csi-plugin time="2023-02-03T15:10:56Z" level=info msg="Enabling controller service capability: CREATE_DELETE_VOLUME"
longhorn-csi-plugin time="2023-02-03T15:10:56Z" level=info msg="Enabling controller service capability: PUBLISH_UNPUBLISH_VOLUME"
longhorn-csi-plugin time="2023-02-03T15:10:56Z" level=info msg="Enabling controller service capability: EXPAND_VOLUME"
longhorn-csi-plugin time="2023-02-03T15:10:56Z" level=info msg="Enabling controller service capability: CREATE_DELETE_SNAPSHOT"
longhorn-csi-plugin time="2023-02-03T15:10:56Z" level=info msg="Enabling controller service capability: CLONE_VOLUME"
longhorn-csi-plugin time="2023-02-03T15:10:56Z" level=info msg="Enabling volume access mode: SINGLE_NODE_WRITER"
longhorn-csi-plugin time="2023-02-03T15:10:56Z" level=info msg="Enabling volume access mode: MULTI_NODE_MULTI_WRITER"
longhorn-csi-plugin time="2023-02-03T15:10:56Z" level=info msg="Listening for connections on address: &net.UnixAddr{Name:\"//csi/csi.sock\", Net:\"unix\"}"
Stream closed EOF for longhorn-system/longhorn-csi-plugin-46gwb (longhorn-csi-plugin)
longhorn-liveness-probe W0203 14:59:43.264724    1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 14:59:53.264996    1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:00:03.264512    1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:00:13.264347    1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:00:23.265398    1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:00:33.265074    1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:00:43.265280    1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:00:53.265217    1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:01:03.265147    1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:01:13.265215    1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:01:23.264284    1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:01:33.264461    1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:01:43.264858    1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:01:53.264423    1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:02:03.264472    1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:02:13.264365    1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:02:23.264418    1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:02:33.264435    1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:02:43.264424    1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:02:53.264902    1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:03:03.265168    1893 connection.go:173] Still connecting to unix:///csi/csi.sock
node-driver-registrar I0203 15:09:39.136282   12242 main.go:166] Version: v2.5.0
node-driver-registrar I0203 15:09:39.136529   12242 main.go:167] Running node-driver-registrar in mode=registration
node-driver-registrar I0203 15:09:39.139643   12242 main.go:191] Attempting to open a gRPC connection with: "/csi/csi.sock"
longhorn-liveness-probe W0203 15:03:13.264639    1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:03:23.264478    1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:03:33.264803    1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:03:43.265385    1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:03:53.264407    1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:04:03.264737    1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:04:13.264446    1893 connection.go:173] Still connecting to unix:///csi/csi.sock
node-driver-registrar I0203 15:09:40.147600   12242 main.go:198] Calling CSI driver to discover driver name
node-driver-registrar I0203 15:09:40.164554   12242 main.go:208] CSI driver name: "driver.longhorn.io"
longhorn-liveness-probe W0203 15:04:23.264419    1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:04:33.264846    1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:04:43.264923    1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:04:53.264457    1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:05:03.265267    1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:05:13.264310    1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:05:23.264378    1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:05:33.264401    1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:05:43.264423    1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:05:53.264611    1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:06:03.265027    1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:06:13.264351    1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:06:23.265314    1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:06:33.264301    1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:06:43.265092    1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:06:53.264417    1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:07:03.264349    1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:07:13.265051    1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:07:23.264404    1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:07:33.264366    1893 connection.go:173] Still connecting to unix:///csi/csi.sock
node-driver-registrar I0203 15:09:40.167502   12242 node_register.go:53] Starting Registration Server at: /registration/driver.longhorn.io-reg.sock
node-driver-registrar I0203 15:09:40.169473   12242 node_register.go:62] Registration Server started at: /registration/driver.longhorn.io-reg.sock
node-driver-registrar I0203 15:09:40.170916   12242 node_register.go:92] Skipping HTTP server because endpoint is set to: ""
node-driver-registrar I0203 15:09:40.835828   12242 main.go:102] Received GetInfo call: &InfoRequest{}
node-driver-registrar I0203 15:09:40.837650   12242 main.go:109] "Kubelet registration probe created" path="/var/lib/kubelet/plugins/driver.longhorn.io/registration"
node-driver-registrar I0203 15:09:41.861423   12242 main.go:120] Received NotifyRegistrationStatus call: &RegistrationStatus{PluginRegistered:true,Error:,}
node-driver-registrar E0203 15:09:55.061520   12242 connection.go:132] Lost connection to unix:///csi/csi.sock.
node-driver-registrar E0203 15:10:10.028389   12242 connection.go:132] Lost connection to unix:///csi/csi.sock.
node-driver-registrar E0203 15:10:55.045211   12242 connection.go:132] Lost connection to unix:///csi/csi.sock.
node-driver-registrar E0203 15:11:10.060312   12242 connection.go:132] Lost connection to unix:///csi/csi.sock.
longhorn-liveness-probe W0203 15:07:43.265142    1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:07:53.264572    1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:08:03.264568    1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:08:13.264631    1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:08:23.265188    1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:08:33.264440    1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:08:43.264419    1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:08:53.265208    1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:09:03.265084    1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:09:13.264582    1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:09:23.265354    1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:09:33.265361    1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:09:43.264915    1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:09:53.264620    1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:10:03.265242    1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:10:13.264917    1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:10:23.264697    1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:10:33.264743    1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:10:43.264954    1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:10:53.264311    1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:11:03.264342    1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:11:13.264602    1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:11:23.264777    1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:11:33.264413    1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:11:43.265103    1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:11:53.264415    1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:12:03.264900    1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:12:13.264534    1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:12:23.264423    1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:12:33.295701    1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:12:43.264362    1893 connection.go:173] Still connecting to unix:///csi/csi.sock

Environment

  • Longhorn version: 1.4.0
  • Installation method (e.g. Rancher Catalog App/Helm/Kubectl): helm
  • Kubernetes distro (e.g. RKE/K3s/EKS/OpenShift) and version: k3s 1.25.6+k3s1
    • Number of management node in the cluster: 3
    • Number of worker node in the cluster: 12
  • Node config
    • OS type and version: MicroOS
    • CPU per node: Depends
    • Memory per node: Depends
    • Disk type(e.g. SSD/NVMe): SSD & NVMe
    • Network bandwidth between the nodes: 1GBe
  • Underlying Infrastructure (e.g. on AWS/GCE, EKS/GKE, VMWare/KVM, Baremetal): Baremetal
  • Number of Longhorn volumes in the cluster: 38

Additional context

Add any other context about the problem here.

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 18 (8 by maintainers)

Most upvoted comments

Heads up @docbobo, I opened https://github.com/k3s-io/k3s-selinux/issues/53 in response to this issue. I think there’s not much we can do from the Longhorn side if the SELinux integration for your Kubernetes distribution is broken. After a decent amount of investigating, I think that is the case.

If it is possible to roll back to selinux-policy-targetd 20231012-1.1, I suggest you try that. Otherwise you can disable SELinux or use an audit2allow approach.

Okay, I got this “fixed” with the following policy:

module local 1.0;

require {
	type container_t;
	type unconfined_service_t;
	class unix_stream_socket connectto;
}

#============= container_t ==============
allow container_t unconfined_service_t:unix_stream_socket connectto;

Why do I only have to do this in 1 of 12 systems?

I am using MicroOS which is Tumbleweed-based. In the change described above, I was coming from 20231017 going to 20231101.

The list above only captures the packages that were differing in versions between the “good” and “bad” instances. Here’s the complete list of packages with selinux in the name, taken on the “bad” instance.

i  | container-selinux                          | package | 2.222.0-1.1                   | noarch  | openSUSE-Tumbleweed-Oss
i  | libselinux1                                | package | 3.5-5.1                       | aarch64 | openSUSE-Tumbleweed-Oss
i  | microos_selinux                            | pattern | 5.0-80.1                      | aarch64 | openSUSE-Tumbleweed-Oss
i  | selinux-policy                             | package | 20231030-1.1                  | noarch  | openSUSE-Tumbleweed-Oss
i  | selinux-policy-targeted                    | package | 20231030-1.1                  | noarch  | openSUSE-Tumbleweed-Oss
i  | selinux-tools                              | package | 3.5-5.1                       | aarch64 | openSUSE-Tumbleweed-Oss
i+ | k3s-selinux                                | package | 1.4.stable.1-1.1              | noarch  | openSUSE-Tumbleweed-Oss
i+ | patterns-microos-selinux                   | package | 5.0-80.1                      | aarch64 | openSUSE-Tumbleweed-Oss

I did a diff between a working, not-yet-updated system and an updated one that is returning those errors. The following OpenSUSE packages related to SELinux differ between the two:

microos_selinux 5.0-79.1 -> 5.0-80.1
policycoreutils 3.5-5.1 -> 3.5-6.1
selinux-policy 20231012-1.1 -> 20231030-1.1
selinux-policy-targeted 20231012-1.1 -> 20231030-1.1

Obviously, there are plenty of other differences. I have a complete list of all the packages here on my system.

I can confirm I have ran into this exact issue. This impacted only a portion of my nodes also for some unknown reason. However I would say this is less of a bug and more of “docs needed” situation.

Also, I have SELinux enabled and on the problematic node, I am seeing the following in the audit.log:

type=AVC msg=audit(1675438507.878:817): avc:  denied  { connectto } for  pid=15054 comm="livenessprobe" path="/csi/csi.sock" scontext=system_u:system_r:container_t:s0:c392,c936 tcontext=system_u:system_r:unconfined_service_t:s0 tclass=unix_stream_socket permissive=0
type=AVC msg=audit(1675438508.698:818): avc:  denied  { connectto } for  pid=15054 comm="livenessprobe" path="/csi/csi.sock" scontext=system_u:system_r:container_t:s0:c392,c936 tcontext=system_u:system_r:unconfined_service_t:s0 tclass=unix_stream_socket permissive=0
type=AVC msg=audit(1675438509.878:819): avc:  denied  { connectto } for  pid=15054 comm="livenessprobe" path="/csi/csi.sock" scontext=system_u:system_r:container_t:s0:c392,c936 tcontext=system_u:system_r:unconfined_service_t:s0 tclass=unix_stream_socket permissive=0
type=AVC msg=audit(1675438510.798:820): avc:  denied  { connectto } for  pid=15054 comm="livenessprobe" path="/csi/csi.sock" scontext=system_u:system_r:container_t:s0:c392,c936 tcontext=system_u:system_r:unconfined_service_t:s0 tclass=unix_stream_socket permissive=0
type=AVC msg=audit(1675438511.798:821): avc:  denied  { connectto } for  pid=15054 comm="livenessprobe" path="/csi/csi.sock" scontext=system_u:system_r:container_t:s0:c392,c936 tcontext=system_u:system_r:unconfined_service_t:s0 tclass=unix_stream_socket permissive=0
type=AVC msg=audit(1675438512.868:822): avc:  denied  { connectto } for  pid=15054 comm="livenessprobe" path="/csi/csi.sock" scontext=system_u:system_r:container_t:s0:c392,c936 tcontext=system_u:system_r:unconfined_service_t:s0 tclass=unix_stream_socket permissive=0
type=AVC msg=audit(1675438513.838:823): avc:  denied  { connectto } for  pid=15054 comm="livenessprobe" path="/csi/csi.sock" scontext=system_u:system_r:container_t:s0:c392,c936 tcontext=system_u:system_r:unconfined_service_t:s0 tclass=unix_stream_socket permissive=0
type=AVC msg=audit(1675438514.998:824): avc:  denied  { connectto } for  pid=15054 comm="livenessprobe" path="/csi/csi.sock" scontext=system_u:system_r:container_t:s0:c392,c936 tcontext=system_u:system_r:unconfined_service_t:s0 tclass=unix_stream_socket permissive=0
type=AVC msg=audit(1675438516.068:825): avc:  denied  { connectto } for  pid=15054 comm="livenessprobe" path="/csi/csi.sock" scontext=system_u:system_r:container_t:s0:c392,c936 tcontext=system_u:system_r:unconfined_service_t:s0 tclass=unix_stream_socket permissive=0
type=AVC msg=audit(1675438517.108:826): avc:  denied  { connectto } for  pid=15054 comm="livenessprobe" path="/csi/csi.sock" scontext=system_u:system_r:container_t:s0:c392,c936 tcontext=system_u:system_r:unconfined_service_t:s0 tclass=unix_stream_socket permissive=0