harvester: [BUG] post-upgrade hook of longhorn-storageclass-override.yaml has hidden issue

Describe the bug

Fleet-agent/Helm is always complaing “configmap.v1 longhorn-system/longhorn-storageclass missing”, and fleet-agent will try again every 5 minutes.

time="2022-03-27T12:02:02Z" level=warning msg="DEV bd: name:mcc-harvester MonitorBundle: has modified status, EnqueueAfter 5 minutes, status[0]:configmap.v1 longhorn-system/longhorn-storageclass missing"

fleet-agent has lots of log like: time="2022-03-27T09:14:13Z" level=info msg="Helm: Upgrading mcc-harvester"

The possible root cause: The pose-upgrade hook modified resources, which caused Helm assumes failures when checking consistency of the whole release. (Need to be confirmed)

https://helm.sh/docs/topics/charts_hooks/ does not explains clearly if a post-install/post-upgrade hook could modify/update a resource

image

image

The 5 minutes loop behaviour may also influence the real system upgrade.

To Reproduce Steps to reproduce the behavior:

  1. Looks in any running Harvester, this message is there.

Expected behavior

When the cluster is stable, there should not be so many backgroud upgades.

Support bundle

Environment:

  • Harvester ISO version: V1.0.1
  • Underlying Infrastructure (e.g. Baremetal with Dell PowerEdge R630): KVM based Harvester.

Additional context Add any other context about the problem here.

https://github.com/harvester/harvester/issues/857

https://github.com/harvester/harvester/issues/1893 could also be related

https://github.com/harvester/harvester/issues/2013

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 17 (13 by maintainers)

Most upvoted comments

This is a legacy issue from Harvester v0.3.0. Will need to consider the upgrade migration path after we change the current default SC, but from a long-term perspective, yes, using a customized harvester-longhorn storage class is more preferred (e.g., considering multi-storage-class support feature https://github.com/harvester/harvester/issues/2147 in the future ).

Removing the override doesn’t get the chart out of modified status, here is what we observed: We are now at Longhorn v1.3.1.

  summary:
    desiredReady: 1
    modified: 1
    nonReadyResources:
    - bundleState: Modified
      modifiedStatus:
      - apiVersion: apiextensions.k8s.io/v1
        kind: CustomResourceDefinition
        name: engineimages.longhorn.io
        patch: '{"status":{"acceptedNames":{"kind":"","plural":""},"conditions":[],"storedVersions":[]}}'
      - apiVersion: apiextensions.k8s.io/v1
        kind: CustomResourceDefinition
        name: nodes.longhorn.io
        patch: '{"status":{"acceptedNames":{"kind":"","plural":""},"conditions":[],"storedVersions":[]}}'
      - apiVersion: apiextensions.k8s.io/v1
        kind: CustomResourceDefinition
        name: volumes.longhorn.io
        patch: '{"status":{"acceptedNames":{"kind":"","plural":""},"conditions":[],"storedVersions":[]}}'
      name: fleet-local/local
    ready: 0

The new error message from fleet-agent is caused by LH new features. https://github.com/harvester/harvester/issues/2762

Harvester patch caused error longhorn-system/longhorn-storageclass missing, which is gone after the fix.

The issue seems to be related to helm hook processing in fleet.

The fleet bundle always shows as missing the storage class configmap.

The longhorn PR https://github.com/longhorn/longhorn/pull/3881 exposes the storage class parameter natively.

This should allow the managed chart to use the same by just setting the additional value in the definition in the harvester-installer.

This should stop the issue of longhorn storage class background updates.