longhorn: [BUG] TooManySnapshots: Snapshots count is 249 over the warning threshold 100

Describe the bug Too many snapshots with no visible reason. Got this on one specific volume. Can’t get a clue why this happening – there is no activity on this volume, just Grafana’s dashboards. The first snapshot size is equal to the actual size, all else is 0 Bi. No problems were found until the 1.2.0 upgrade

To Reproduce Steps to reproduce the behavior:

  1. ???

Expected behavior No error

Log If applicable, add the Longhorn managers’ log when the issue happens.

You can also attach a Support Bundle here. You can generate a Support Bundle using the link at the footer of the Longhorn UI.

Environment:

  • Longhorn version: 1.2.0
  • Installation method (e.g. Rancher Catalog App/Helm/Kubectl): helm
  • Kubernetes distro (e.g. RKE/K3s/EKS/OpenShift) and version: k3s v1.21.4+k3s1
    • Number of management node in the cluster: 1
    • Number of worker node in the cluster: 4
  • Node config
    • OS type and version: Ubuntu 20.04.3 LTS
    • CPU per node: 4
    • Memory per node: 8G
    • Disk type(e.g. SSD/NVMe): SSD
    • Network bandwidth between the nodes: 1G
  • Underlying Infrastructure (e.g. on AWS/GCE, EKS/GKE, VMWare/KVM, Baremetal): Baremetal
  • Number of Longhorn volumes in the cluster: 5

Additional context Some info about PV, PVC, and volume: https://gist.github.com/lexfrei/f33256f6d284e991569a58785492f0e2 Can clean snapshot via lhexec <volume-name> snapshot purge, this fixes the issue for a day.

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 17 (8 by maintainers)

Most upvoted comments

How do you think about this. We have a backoff limit so we don’t mark the rebuilding replica as failed too fast though.

We already introduced the backoff interval for failed replica usage. But if a new replica cannot be even started for one time, Longhorn will remove it then directly recreate a new one. I think we can improve this part in the future.

@PhanLe1010 The snapshot was hidden and by “showing system hidden” I was able to delete snapshots. Thanks a lot!

@Dubouchj

Can you try to use Longhorn UI to delete snapshots?