harvester: [BUG] Upgrade: rancher-monitoring charts can't be upgraded

Describe the bug

This was observed after https://github.com/harvester/harvester-installer/pull/229. rancher-monitoring and rancher-monitoring-crd ManagedCharts fail to upgrade. The message is another operation (install/upgrade/rollback) is in progress.

fleet-local   mcc-rancher-monitoring                        0/1                       ErrApplied(1) [Cluster fleet-local/local: another operation (install/upgrade/rollback) is in progress]; mutatingwebhookconfiguration.admissionregistration.k8s.io rancher-monitoring-admission modified {"webhooks":[{"admissionReviewVersions":["v1","v1beta1"],"clientConfig":{"caBundle":"LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUJkVENDQVJ1Z0F3SUJBZ0lRUDd5MVA0K2cvRlZ5OGdaSW93bElDVEFLQmdncWhrak9QUVFEQWpBUE1RMHcKQ3dZRFZRUUtFd1J1YVd3eE1DQVhEVEl5TURFd05EQXlNRGcwTUZvWUR6SXhNakV4TWpFeE1ESXdPRFF3V2pBUApNUTB3Q3dZRFZRUUtFd1J1YVd3eE1Ga3dFd1lIS29aSXpqMENBUVlJS29aSXpqMERBUWNEUWdBRXEzQWhrdnh0CmVMUHBpOG45NmVkRmlWbGZiK0xRWHRmSVJFOUJDMmRlVmhxWjNvVUx1R0hucldqc3k0OWJMM0JVZEw0QWg4VEkKQUhORG9sbzNRVElGRHFOWE1GVXdEZ1lEVlIwUEFRSC9CQVFEQWdJRU1CTUdBMVVkSlFRTU1Bb0dDQ3NHQVFVRgpCd01CTUE4R0ExVWRFd0VCL3dRRk1BTUJBZjh3SFFZRFZSME9CQllFRlB3U0xOd2VaeE02VHZMS1dodnNxd2t5Cjc0RzNNQW9HQ0NxR1NNNDlCQU1DQTBnQU1FVUNJUUR3VmlHZEtSdXI3N2dUTElZVkxkNzZXM3N3eGtYd1I5c2gKdDBXN081azdNUUlnYmpRYWVsWEd6dnhQK1dtaHdGYWVndFdFVGJSSkQ2aXpzZVpJOWJ3c3ZQZz0KLS0tLS1FTkQgQ0VSVElGSUNBVEUtLS0tLQo=","service":{"name":"rancher-monitoring-operator","namespace":"cattle-monitoring-system","path":"/admission-prometheusrules/mutate","port":443}},"failurePolicy":"Ignore","matchPolicy":"Equivalent","name":"prometheusrulemutate.monitoring.coreos.com","namespaceSelector":{},"objectSelector":{},"reinvocationPolicy":"Never","rules":[{"apiGroups":["monitoring.coreos.com"],"apiVersions":["*"],"operations":["CREATE","UPDATE"],"resources":["prometheusrules"],"scope":"*"}],"sideEffects":"None","timeoutSeconds":10}]}; validatingwebhookconfiguration.admissionregistration.k8s.io rancher-monitoring-admission modified {"webhooks":[{"admissionReviewVersions":["v1","v1beta1"],"clientConfig":{"caBundle":"LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUJkVENDQVJ1Z0F3SUJBZ0lRUDd5MVA0K2cvRlZ5OGdaSW93bElDVEFLQmdncWhrak9QUVFEQWpBUE1RMHcKQ3dZRFZRUUtFd1J1YVd3eE1DQVhEVEl5TURFd05EQXlNRGcwTUZvWUR6SXhNakV4TWpFeE1ESXdPRFF3V2pBUApNUTB3Q3dZRFZRUUtFd1J1YVd3eE1Ga3dFd1lIS29aSXpqMENBUVlJS29aSXpqMERBUWNEUWdBRXEzQWhrdnh0CmVMUHBpOG45NmVkRmlWbGZiK0xRWHRmSVJFOUJDMmRlVmhxWjNvVUx1R0hucldqc3k0OWJMM0JVZEw0QWg4VEkKQUhORG9sbzNRVElGRHFOWE1GVXdEZ1lEVlIwUEFRSC9CQVFEQWdJRU1CTUdBMVVkSlFRTU1Bb0dDQ3NHQVFVRgpCd01CTUE4R0ExVWRFd0VCL3dRRk1BTUJBZjh3SFFZRFZSME9CQllFRlB3U0xOd2VaeE02VHZMS1dodnNxd2t5Cjc0RzNNQW9HQ0NxR1NNNDlCQU1DQTBnQU1FVUNJUUR3VmlHZEtSdXI3N2dUTElZVkxkNzZXM3N3eGtYd1I5c2gKdDBXN081azdNUUlnYmpRYWVsWEd6dnhQK1dtaHdGYWVndFdFVGJSSkQ2aXpzZVpJOWJ3c3ZQZz0KLS0tLS1FTkQgQ0VSVElGSUNBVEUtLS0tLQo=","service":{"name":"rancher-monitoring-operator","namespace":"cattle-monitoring-system","path":"/admission-prometheusrules/validate","port":443}},"failurePolicy":"Ignore","matchPolicy":"Equivalent","name":"prometheusrulemutate.monitoring.coreos.com","namespaceSelector":{},"objectSelector":{},"rules":[{"apiGroups":["monitoring.coreos.com"],"apiVersions":["*"],"operations":["CREATE","UPDATE"],"resources":["prometheusrules"],"scope":"*"}],"sideEffects":"None","timeoutSeconds":10}]}
fleet-local   mcc-rancher-monitoring-crd                    0/1                       ErrApplied(1) [Cluster fleet-local/local: another operation (install/upgrade/rollback) is in progress]

To Reproduce Steps to reproduce the behavior:

Setup a 1.0.0 cluster.
Upgrade with master ISO.

Expected behavior

Support bundle

Environment:

Harvester ISO version:
Underlying Infrastructure (e.g. Baremetal with Dell PowerEdge R630):

Additional context Add any other context about the problem here.

About this issue

Original URL
State: closed
Created 2 years ago
Comments: 18 (14 by maintainers)

Most upvoted comments

@bk201 I found one possible cause, described in https://github.com/harvester/harvester/issues/1983#issuecomment-1076812969 , but not sure if fleet-agent noticed this and has plan to improve it.

w13915984028 on May 20, 2022

@w13915984028 @weihanglo Do you know any workaround to get rid of this state? I did have luck to rollback the revision on a chart before, not sure if there is a better way.

bk201 on May 20, 2022

When checking the fleet-agent log, it showes, even in a normal running Harvester cluster there are also many "performing update for … " log. (Why it does so in under investigation via #2013 .)

There are chances to hit this bug.

w13915984028 on Mar 23, 2022

I’m still seeing harvester chart goes into this state occasionally. Even I already pause the managed chart and wait for all Rancehr stuff to settle down. A trick to get rid of the state is to rollback the chart by using helm command. The managedchart will be applied again.

bk201 on Mar 19, 2022