longhorn: [BUG] Error starting manager: upgrade Pods failed

Describe the bug I have have the following message when starting the manager pod on a second node on the cluster. All appears to be OK with the first node.

level=fatal msg="Error starting manager: upgrade Pods failed: upgrade from v1.0.2 to v1.1.0: upgrade volume failed: Operation cannot be fulfilled on volumes.longhorn.io \"pvc-6a457e39-109f-4b21-aa4f-9446c90e6539\": the object has been modified; please apply your changes to the latest version and try again"

What does this mean and how to solve this error?

Full logs of the manager pod on the failing node

time="2021-05-10T16:15:27Z" level=info msg="Start overwriting built-in settings with customized values"
W0510 16:15:27.997083       1 client_config.go:541] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
time="2021-05-10T16:15:28Z" level=info msg="cannot list the content of the src directory /var/lib/rancher/longhorn/engine-binaries for the copy, will do nothing: Failed to execute: nsenter [--mount=/host/proc/1/ns/mnt --net=/host/proc/1/ns/net bash -c ls /var/lib/rancher/longhorn/engine-binaries/*], output , stderr, ls: cannot access '/var/lib/rancher/longhorn/engine-binaries/*': No such file or directory\n, error exit status 2"
I0510 16:15:28.001357       1 leaderelection.go:241] attempting to acquire leader lease  longhorn-system/longhorn-manager-upgrade-lock...
I0510 16:15:28.122597       1 leaderelection.go:251] successfully acquired lease longhorn-system/longhorn-manager-upgrade-lock
time="2021-05-10T16:15:28Z" level=info msg="Start upgrading"
time="2021-05-10T16:15:28Z" level=info msg="No API version upgrade is needed"
time="2021-05-10T16:15:30Z" level=error msg="Upgrade failed: upgrade Pods failed: upgrade from v1.0.2 to v1.1.0: upgrade volume failed: Operation cannot be fulfilled on volumes.longhorn.io \"pvc-6a457e39-109f-4b21-aa4f-9446c90e6539\": the object has been modified; please apply your changes to the latest version and try again"
time="2021-05-10T16:15:30Z" level=info msg="Upgrade leader lost: <node 2>"
time="2021-05-10T16:15:30Z" level=fatal msg="Error starting manager: upgrade Pods failed: upgrade from v1.0.2 to v1.1.0: upgrade volume failed: Operation cannot be fulfilled on volumes.longhorn.io \"pvc-6a457e39-109f-4b21-aa4f-9446c90e6539\": the object has been modified; please apply your changes to the latest version and try again"

Environment:

  • Longhorn version: 1.1.0
  • Installation method (e.g. Rancher Catalog App/Helm/Kubectl): Kubectl
  • Kubernetes distro (e.g. RKE/K3s/EKS/OpenShift) and version: k3s
    • Number of management node in the cluster: 1
    • Number of worker node in the cluster: 2
  • Node config
    • OS type and version: k3os v0.11.1
    • CPU per node: 12
    • Memory per node: 64
    • Disk type(e.g. SSD/NVMe): NVMe
    • Network bandwidth between the nodes:
  • Underlying Infrastructure (e.g. on AWS/GCE, EKS/GKE, VMWare/KVM, Baremetal): Baremetal
  • Number of Longhorn volumes in the cluster: 44

About this issue

  • Original URL
  • State: open
  • Created 3 years ago
  • Reactions: 3
  • Comments: 55 (26 by maintainers)

Most upvoted comments

For whoever encounters this.

We had a lot of pain due to this (it exacerbates other issues we have with v1.1.1, from which we can’t upgrade just yet). We found that restarting the longhorn-manager daemonset helps with some of our other issues, but that’s not possible when there are many degraded volumes.

Since there’s no real workaround, I’ve built a version of longhorn-manager:v1.1.1 that also has this one bug fixed (commit https://github.com/longhorn/longhorn-manager/commit/1d2093a84118259c5953afceff9ed14f6dac08ba) to get through it: https://registry.hub.docker.com/layers/excieve/longhorn-manager/v1.1.1-upgradefix/images/sha256-3dff6df072913badbbb4f3e888f1df9d6a79c7464cea2243d2e7e4c060eee5e7?context=explore

No guarantees, use at your own risk obviously.

This bug is painful 😦

@liyimeng Ideally, longhorn-manager should not affect the functionality of Longhorn volumes. We will improve the upgrade path for this issue in the future, rather than stopping the world if there is a not-ready longhorn-manager.

Right now, 48 volumes in use.