longhorn: [BUG] Longhorn installation fails sometimes with longhorn-manager pods stuck to come up healthy.
Describe the bug The longhorn v1.0.1 installation sometimes fails on a cluster of 4 nodes(3 worker, 1etc/control plane)
To Reproduce Steps to reproduce the behavior:
- Create a cluster of 3 workers and 1 etc/control plane.
- Install longhorn v1.0.1 from the catalog app.
- Try to install and uninstall multiple times
- Sometimes longhorn installation fails with longhorn-manager pods stuck in crashloop.
Engine image is marked as
incompatible
Expected behavior Installation should be always successful
Log
time="2020-08-04T17:41:17Z" level=info msg="Start overwriting built-in settings with customized values"
time="2020-08-04T17:41:17Z" level=info msg="cannot list the content of the src directory /var/lib/rancher/longhorn/engine-binaries for the copy, will do nothing: Failed to execute: nsenter [--mount=/host/proc/7970/ns/mnt --net=/host/proc/7970/ns/net bash -c ls /var/lib/rancher/longhorn/engine-binaries/*], output , stderr, ls: cannot access '/var/lib/rancher/longhorn/engine-binaries/*': No such file or directory\n, error exit status 2"
time="2020-08-04T17:41:17Z" level=info msg="New upgrade leader elected: khushboo-test-wk3"
time="2020-08-04T17:41:37Z" level=info msg="Start upgrading"
time="2020-08-04T17:41:37Z" level=info msg="No API version upgrade is needed"
time="2020-08-04T17:41:37Z" level=info msg="Finish upgrading"
E0804 17:41:37.844128 1 leaderelection.go:282] Failed to release lock: Lease.coordination.k8s.io "longhorn-manager-upgrade-lock" is invalid: spec.leaseDurationSeconds: Invalid value: 0: must be greater than 0
time="2020-08-04T17:41:37Z" level=info msg="Upgrade leader lost: khushboo-test-wk2"
E0804 17:41:37.854043 1 kubernetes_node_controller.go:256] Couldn't get nodes khushboo-test-wk2: node "khushboo-test-wk2" not found
time="2020-08-04T17:41:37Z" level=debug msg="Waiting for engine image longhornio/longhorn-engine:v1.0.1 to be ready"
time="2020-08-04T17:41:37Z" level=info msg="Start Longhorn node controller"
time="2020-08-04T17:41:37Z" level=info msg="Start Longhorn volume controller"
time="2020-08-04T17:41:37Z" level=info msg="Start Longhorn Engine Image controller"
time="2020-08-04T17:41:37Z" level=info msg="Start Longhorn websocket controller"
time="2020-08-04T17:41:37Z" level=info msg="Start Longhorn engine controller"
time="2020-08-04T17:41:37Z" level=info msg="Start Longhorn Setting controller"
time="2020-08-04T17:41:37Z" level=info msg="Starting Longhorn instance manager controller"
time="2020-08-04T17:41:37Z" level=info msg="Start kubernetes controller"
time="2020-08-04T17:41:37Z" level=info msg="Start Longhorn Kubernetes node controller"
time="2020-08-04T17:41:37Z" level=info msg="Start Longhorn replica controller"
time="2020-08-04T17:41:38Z" level=debug msg="Start monitoring instance manager instance-manager-r-037f05af"
time="2020-08-04T17:41:38Z" level=debug msg="Start monitoring instance manager instance-manager-e-eb769411"
time="2020-08-04T17:41:43Z" level=debug msg="Waiting for engine image longhornio/longhorn-engine:v1.0.1 to be ready"
time="2020-08-04T17:41:49Z" level=debug msg="Waiting for engine image longhornio/longhorn-engine:v1.0.1 to be ready"
time="2020-08-04T17:41:55Z" level=debug msg="Waiting for engine image longhornio/longhorn-engine:v1.0.1 to be ready"
time="2020-08-04T17:42:01Z" level=debug msg="Waiting for engine image longhornio/longhorn-engine:v1.0.1 to be ready"
time="2020-08-04T17:42:07Z" level=debug msg="Waiting for engine image longhornio/longhorn-engine:v1.0.1 to be ready"
Environment:
- Longhorn version: v1.0.1
- Kubernetes version: 18.6
- Node OS type and version: Ubuntu 18.04
Script to install and uninstall longhorn continuously install.sh.zip Requires longhorn.yaml in the same location to run the script. longhorn.yaml.zip
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Reactions: 2
- Comments: 17 (9 by maintainers)
@mritd Of course, you can. But you need to follow the instruction to uninstall (rather than just do
kubectl delete). There are things we need to clean up during uninstallation that cannot be done bykubectl delete.You need to follow the uninstall instructions here https://longhorn.io/docs/1.0.2/deploy/uninstall/ . Use Longhorn uninstaller, or Helm instead of yaml for installation.
1.0.2 same issue kubernetes version 1.19.3
I tested the method provided by @shuo-wu , but unfortunately my UI keeps showing 502 errors.
I am a newbie to longhorn and I cannot install longhorn through kubectl…