longhorn: [DOC] longhorn-csi-plugin stuck in CrashLoopBackOff after system crash (SELinux related)
Describe the bug (🐛 if you encounter this issue)
I just had one of my system crash. After it came back up, the longhorn-csi-plugin ends up in a CrashLoopBackOff. Logs do not seem to hint at any particular reason.
To Reproduce
n/a
Expected behavior
longhorn-csi-plugin starts properly-
Log or Support bundle
longhorn-csi-plugin time="2023-02-03T15:10:56Z" level=info msg="CSI Driver: driver.longhorn.io version: v1.4.0, manager URL http://longhorn-backend:9500/v1"
longhorn-csi-plugin time="2023-02-03T15:10:56Z" level=info msg="Enabling node service capability: GET_VOLUME_STATS"
longhorn-csi-plugin time="2023-02-03T15:10:56Z" level=info msg="Enabling node service capability: STAGE_UNSTAGE_VOLUME"
longhorn-csi-plugin time="2023-02-03T15:10:56Z" level=info msg="Enabling node service capability: EXPAND_VOLUME"
longhorn-csi-plugin time="2023-02-03T15:10:56Z" level=info msg="Enabling controller service capability: CREATE_DELETE_VOLUME"
longhorn-csi-plugin time="2023-02-03T15:10:56Z" level=info msg="Enabling controller service capability: PUBLISH_UNPUBLISH_VOLUME"
longhorn-csi-plugin time="2023-02-03T15:10:56Z" level=info msg="Enabling controller service capability: EXPAND_VOLUME"
longhorn-csi-plugin time="2023-02-03T15:10:56Z" level=info msg="Enabling controller service capability: CREATE_DELETE_SNAPSHOT"
longhorn-csi-plugin time="2023-02-03T15:10:56Z" level=info msg="Enabling controller service capability: CLONE_VOLUME"
longhorn-csi-plugin time="2023-02-03T15:10:56Z" level=info msg="Enabling volume access mode: SINGLE_NODE_WRITER"
longhorn-csi-plugin time="2023-02-03T15:10:56Z" level=info msg="Enabling volume access mode: MULTI_NODE_MULTI_WRITER"
longhorn-csi-plugin time="2023-02-03T15:10:56Z" level=info msg="Listening for connections on address: &net.UnixAddr{Name:\"//csi/csi.sock\", Net:\"unix\"}"
Stream closed EOF for longhorn-system/longhorn-csi-plugin-46gwb (longhorn-csi-plugin)
longhorn-liveness-probe W0203 14:59:43.264724 1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 14:59:53.264996 1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:00:03.264512 1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:00:13.264347 1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:00:23.265398 1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:00:33.265074 1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:00:43.265280 1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:00:53.265217 1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:01:03.265147 1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:01:13.265215 1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:01:23.264284 1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:01:33.264461 1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:01:43.264858 1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:01:53.264423 1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:02:03.264472 1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:02:13.264365 1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:02:23.264418 1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:02:33.264435 1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:02:43.264424 1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:02:53.264902 1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:03:03.265168 1893 connection.go:173] Still connecting to unix:///csi/csi.sock
node-driver-registrar I0203 15:09:39.136282 12242 main.go:166] Version: v2.5.0
node-driver-registrar I0203 15:09:39.136529 12242 main.go:167] Running node-driver-registrar in mode=registration
node-driver-registrar I0203 15:09:39.139643 12242 main.go:191] Attempting to open a gRPC connection with: "/csi/csi.sock"
longhorn-liveness-probe W0203 15:03:13.264639 1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:03:23.264478 1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:03:33.264803 1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:03:43.265385 1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:03:53.264407 1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:04:03.264737 1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:04:13.264446 1893 connection.go:173] Still connecting to unix:///csi/csi.sock
node-driver-registrar I0203 15:09:40.147600 12242 main.go:198] Calling CSI driver to discover driver name
node-driver-registrar I0203 15:09:40.164554 12242 main.go:208] CSI driver name: "driver.longhorn.io"
longhorn-liveness-probe W0203 15:04:23.264419 1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:04:33.264846 1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:04:43.264923 1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:04:53.264457 1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:05:03.265267 1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:05:13.264310 1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:05:23.264378 1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:05:33.264401 1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:05:43.264423 1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:05:53.264611 1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:06:03.265027 1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:06:13.264351 1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:06:23.265314 1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:06:33.264301 1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:06:43.265092 1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:06:53.264417 1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:07:03.264349 1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:07:13.265051 1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:07:23.264404 1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:07:33.264366 1893 connection.go:173] Still connecting to unix:///csi/csi.sock
node-driver-registrar I0203 15:09:40.167502 12242 node_register.go:53] Starting Registration Server at: /registration/driver.longhorn.io-reg.sock
node-driver-registrar I0203 15:09:40.169473 12242 node_register.go:62] Registration Server started at: /registration/driver.longhorn.io-reg.sock
node-driver-registrar I0203 15:09:40.170916 12242 node_register.go:92] Skipping HTTP server because endpoint is set to: ""
node-driver-registrar I0203 15:09:40.835828 12242 main.go:102] Received GetInfo call: &InfoRequest{}
node-driver-registrar I0203 15:09:40.837650 12242 main.go:109] "Kubelet registration probe created" path="/var/lib/kubelet/plugins/driver.longhorn.io/registration"
node-driver-registrar I0203 15:09:41.861423 12242 main.go:120] Received NotifyRegistrationStatus call: &RegistrationStatus{PluginRegistered:true,Error:,}
node-driver-registrar E0203 15:09:55.061520 12242 connection.go:132] Lost connection to unix:///csi/csi.sock.
node-driver-registrar E0203 15:10:10.028389 12242 connection.go:132] Lost connection to unix:///csi/csi.sock.
node-driver-registrar E0203 15:10:55.045211 12242 connection.go:132] Lost connection to unix:///csi/csi.sock.
node-driver-registrar E0203 15:11:10.060312 12242 connection.go:132] Lost connection to unix:///csi/csi.sock.
longhorn-liveness-probe W0203 15:07:43.265142 1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:07:53.264572 1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:08:03.264568 1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:08:13.264631 1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:08:23.265188 1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:08:33.264440 1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:08:43.264419 1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:08:53.265208 1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:09:03.265084 1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:09:13.264582 1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:09:23.265354 1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:09:33.265361 1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:09:43.264915 1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:09:53.264620 1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:10:03.265242 1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:10:13.264917 1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:10:23.264697 1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:10:33.264743 1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:10:43.264954 1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:10:53.264311 1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:11:03.264342 1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:11:13.264602 1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:11:23.264777 1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:11:33.264413 1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:11:43.265103 1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:11:53.264415 1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:12:03.264900 1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:12:13.264534 1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:12:23.264423 1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:12:33.295701 1893 connection.go:173] Still connecting to unix:///csi/csi.sock
longhorn-liveness-probe W0203 15:12:43.264362 1893 connection.go:173] Still connecting to unix:///csi/csi.sock
Environment
- Longhorn version: 1.4.0
- Installation method (e.g. Rancher Catalog App/Helm/Kubectl): helm
- Kubernetes distro (e.g. RKE/K3s/EKS/OpenShift) and version: k3s 1.25.6+k3s1
- Number of management node in the cluster: 3
- Number of worker node in the cluster: 12
- Node config
- OS type and version: MicroOS
- CPU per node: Depends
- Memory per node: Depends
- Disk type(e.g. SSD/NVMe): SSD & NVMe
- Network bandwidth between the nodes: 1GBe
- Underlying Infrastructure (e.g. on AWS/GCE, EKS/GKE, VMWare/KVM, Baremetal): Baremetal
- Number of Longhorn volumes in the cluster: 38
Additional context
Add any other context about the problem here.
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 18 (8 by maintainers)
Heads up @docbobo, I opened https://github.com/k3s-io/k3s-selinux/issues/53 in response to this issue. I think there’s not much we can do from the Longhorn side if the SELinux integration for your Kubernetes distribution is broken. After a decent amount of investigating, I think that is the case.
If it is possible to roll back to
selinux-policy-targetd 20231012-1.1, I suggest you try that. Otherwise you can disable SELinux or use anaudit2allowapproach.Okay, I got this “fixed” with the following policy:
Why do I only have to do this in 1 of 12 systems?
I am using MicroOS which is Tumbleweed-based. In the change described above, I was coming from 20231017 going to 20231101.
The list above only captures the packages that were differing in versions between the “good” and “bad” instances. Here’s the complete list of packages with selinux in the name, taken on the “bad” instance.
I did a diff between a working, not-yet-updated system and an updated one that is returning those errors. The following OpenSUSE packages related to SELinux differ between the two:
Obviously, there are plenty of other differences. I have a complete list of all the packages here on my system.
I can confirm I have ran into this exact issue. This impacted only a portion of my nodes also for some unknown reason. However I would say this is less of a bug and more of “docs needed” situation.
Also, I have SELinux enabled and on the problematic node, I am seeing the following in the audit.log: