longhorn: k3s v1.19.2+k3s1 : longhorn-driver-deployer CrashLoopBackOff

I am getting a CrashLoopBackOff for the longhorn-driver-deployer when deploying to k3s v1.19.2+k3s1.

Installed from upstream kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/master/deploy/longhorn.yaml

And the logs from the pod.

clemenko:clemenko k3s ( 174.138.56.187:6443 ) $ kubectl logs longhorn-driver-deployer-6b7d76659f-vjflp -n longhorn-system
time="2020-10-07T18:12:44Z" level=debug msg="Deploying CSI driver"
time="2020-10-07T18:12:44Z" level=debug msg="proc cmdline detection pod discover-proc-kubelet-cmdline in phase: Pending"
time="2020-10-07T18:12:45Z" level=debug msg="proc cmdline detection pod discover-proc-kubelet-cmdline in phase: Pending"
time="2020-10-07T18:12:46Z" level=warning msg="Proc not found: kubelet"
time="2020-10-07T18:12:46Z" level=debug msg="proc cmdline detection pod discover-proc-k3s-cmdline in phase: Pending"
time="2020-10-07T18:12:47Z" level=debug msg="proc cmdline detection pod discover-proc-k3s-cmdline in phase: Pending"
time="2020-10-07T18:12:48Z" level=debug msg="proc cmdline detection pod discover-proc-k3s-cmdline in phase: Running"
time="2020-10-07T18:12:49Z" level=warning msg="Proc not found: k3s"
time="2020-10-07T18:12:49Z" level=error msg="failed to get arg root-dir. Need to specify \"--kubelet-root-dir\" in your Longhorn deployment yaml.: failed to get kubelet root dir, no related proc for root-dir detection, error out"
time="2020-10-07T18:12:49Z" level=fatal msg="Error deploying driver: failed to get arg root-dir. Need to specify \"--kubelet-root-dir\" in your Longhorn deployment yaml.: failed to get kubelet root dir, no related proc for root-dir detection, error out"

This works perfectly fine with k3s 1.18. You can recreate with deploying with on the latest channel. Stable works.

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 21 (11 by maintainers)

Commits related to this issue

Most upvoted comments

If anyone wants to install using the helm chart, an overrides file with:

csi:
  kubeletRootDir: /var/lib/kubelet

or directly helm upgrade longhorn longhorn/longhorn --namespace longhorn-system --set csi.kubeletRootDir=/var/lib/kubelet

Seemed to do the trick for me.

The workaround worked. Thanks! for those playing along at home :

curl https://raw.githubusercontent.com/longhorn/longhorn/master/deploy/longhorn.yaml | sed -e 's/#- name: KUBELET_ROOT_DIR/- name: KUBELET_ROOT_DIR/g' -e 's$#  value: /var/lib/rancher/k3s/agent/kubelet$  value: /var/lib/kubelet$g' | kubectl apply -f -

for the win

Hello @robertorubioguardia,

After some digging, I can make the Longhorn working with microk8s with KUBELET_ROOT_DIR point to /var/snap/microk8s/common/var/lib/kubelet.

As the kubelet parameters used by microk8s is: --root-dir=${SNAP_COMMON}/var/lib/kubelet. Where $SNAP_COMMON can be check by entering snap run --shell microk8s then env | grep SNAP.

Please try your installation again with this command(that I just modified from yours):

$ curl https://raw.githubusercontent.com/longhorn/longhorn/v1.2.0/deploy/longhorn.yaml | sed -e 's/#- name: KUBELET_ROOT_DIR/- name: KUBELET_ROOT_DIR/g' -e 's$#  value: /var/lib/rancher/k3s/agent/kubelet$  value: /var/snap/microk8s/common/var/lib/kubelet$g' | kubectl apply -f -

This is working on my local VMs with latest/stable channel at v1.22.2, please let me know if you are using different version.

@morremeyer not backporting the fix would mean that it would only be available on Longhorn v1.1.0 release and the releases after that. However, we will backport this to older Longhorn versions so that users don’t have to do the workaround

Thanks for the reporting @clemenko . @khushboo-rancher and I have reproduced the issue with k3s v1.19.2+k3s1. We also verified that k3s v1.18 works fine.

For now, the temporary workaround is to set KUBELET_ROOT_DIR to /var/lib/kubelet here.

The root cause is k3s changed the command line separator. Before k3s v1.19, the cmdline is separated by \00:

hexdump -C cmdline
00000000  2f 75 73 72 2f 6c 6f 63  61 6c 2f 62 69 6e 2f 6b  |/usr/local/bin/k|
00000010  33 73 00 61 67 65 6e 74  00 2d 2d 6e 6f 64 65 2d  |3s.agent.--node-|
00000020  65 78 74 65 72 6e 61 6c  2d 69 70 00 31 38 2e 32  |external-ip.18.2|
00000030  31 36 2e 31 38 2e 31 35  37 00                    |16.18.157.|
0000003a

After v1.19, it’s using normal spaces, which is \20:

hexdump -C cmdline
00000000  2f 75 73 72 2f 6c 6f 63  61 6c 2f 62 69 6e 2f 6b  |/usr/local/bin/k|
00000010  33 73 20 61 67 65 6e 74  00 00 00 00 00 00 00 00  |3s agent........|
00000020  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

Our driver detection script at https://github.com/longhorn/longhorn-manager/blob/6efea60312f19e6e82caee0f1632629f35ffc86a/app/get_proc_arg.go#L53 is using \000 instead of \x20 as the separator so it failed to recognized the new format.

It’s a straightforward fix but we do need to consider both situations.

Also, not sure when the code was checked in too and if it will only affect v1.19 in the future.

@PhanLe1010 Backporting means we need to create a new release v1.0.3, which we decide not to do last time since it’s very close to the v1.1.0 release.