longhorn: [BUG] helm upgrade won't apply customize default setting

Is your feature request related to a problem? Please describe.

The longhorn-default-setting ConfigMap sync to settings CR only when settings CR does not exist. This means that during helm upgrade if the user changes the setting, then it won’t applies to settings CR (but applies to longhorn-default-setting ConfigMap only)

helm upgrade longhorn longhorn/longhorn -n longhorn-system \
    --set defaultSettings.backupTarget=<new-backup-target> \
    --set defaultSettings.backupTargetCredentialSecret=<new-backup-target-credential-secret>

Describe the solution you’d like

I think we could configure the settings CR directly. So, we don’t have to write a setting controller to reconcile to longhorn-default-setting ConfigMap to settings CR. However, to accomplish it, we need to have a structural schema on CRDs and also, have an admission webhook to validate the input of these settings.

Describe alternatives you’ve considered

N/A

Additional context

Related issues

About this issue

Original URL
State: closed
Created 3 years ago
Reactions: 6
Comments: 46 (38 by maintainers)

Most upvoted comments

I just rollback to previous ConfigMap and implement the setting update in upgrade path. Looks feasible. I will submit the PR after fully testing.

The way you proposed is a way to do it, but the helm chart is not just for install or upgrade, it is probably used by users to update the existing manifests instead of only upgrading to a new Longhorn version, so a version flag to determine if need to update settings probably will not work completely?

If the Longhorn version is changed, update the latest settings provided by users. If not changed, don’t update the setting.

Only update the manifests and the version is not changed, then the setting will not be updated.

derekbit on May 13, 2022

Yes, we already known it, thanks.

jenting on Dec 26, 2021

Validate on longhorn

v1.3.0-rc2 (25832b90d5c1686574ae9f840fea4245eedcc54f) v1.2.x-head (https://github.com/longhorn/longhorn/commit/3bad923306beeb0c6ac88bf839a6efee5fb19713) v1.1.x-head (https://github.com/longhorn/longhorn/commit/aebd0b10868af2f20bcf45c7f54a00a464221055)

Test steps Result Pass

Test 1. Fresh installation Check default setting value working properly at each step

Install Longhorn with customized-default-settings.yaml
Update default settings using kubectl
Upgrade longhorn-manager(different tag) without a new customized-default-settings.yaml
Upgrade longhorn-manager(different tag) with a new customized-default-settings.yaml
Delete longhorn-manager and wait for restart.
Update default settings using new-customized-default-settings2.yaml, new-customized-default-settings3.yaml

Test 2.1 Upgrade from v1.1.x/v1.2.x to master-head and without a new customized-default-settings.yaml Check default setting value working properly at each step

Install Longhorn v1.1.x or v1.2.x with customized-default-settings.yaml
Upgrade to master-head
Upgrade longhorn-manager(different tag) without a new customized-default-settings.yaml
Upgrade longhorn-manager(different tag) with a new customized-default-settings.yaml
Delete longhorn-manager and wait for restart.
Update default settings using new-customized-default-settings2.yaml, new-customized-default-settings3.yaml

Test 3 Upgrade from v1.1.x/v1.2.x to master-head and with a new customized-default-settings.yaml Check default setting value working properly at each step

Install Longhorn v1.1.x or v1.2.x with customized-default-settings.yaml
Upgrade to master-head
Update default settings using kubectl
Delete longhorn-manager and wait for restart.

cc @innobead , @derekbit

chriscchien on Jun 1, 2022

Add more e2e test cases for in-place update via config map controller. https://github.com/longhorn/longhorn-tests/pull/969

cc @cchien816

derekbit on May 25, 2022

@cchien816 @derekbit We need one more test case to test the in-place update via the config map controller.

@derekbit could you help add the test case? Thanks.

Yes, I’m working on it.

derekbit on May 24, 2022

Also tested the upgrade on Rancher UI. The default settings set in v1.2.4 are reserved as expected after upgrading to v1.3.0.

derekbit on May 19, 2022

After updating the charts in longhorn/longhorn, the installation still sometimes encounter the error
# helm install longhorn ./ --set defaultSettings.defaultReplicaCount=11 -n longhorn-system --create-namespace
W0513 05:22:43.040566   26654 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
W0513 05:22:43.102278   26654 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
Error: INSTALLATION FAILED: failed post-install: unable to build kubernetes object for deleting hook longhorn/templates/customize-default-setting.yaml: unable to recognize "": no matches for kind "Setting" in version "longhorn.io/v1beta2"
Although we already add
    "helm.sh/hook": post-install,post-upgrade
    "helm.sh/hook-delete-policy": before-hook-creation
in the Setting CRs template as @jenting mentioned (#2570 (comment)), the error still happens in helm3 (v3.8.2).

It is because the Setting CRs are applied before the Setting CRD is fully registered. Similar issues are reported from other projects, e.g. cert-manager/cert-manager#3377

Currently, helm3 can pre-installs the CRD manifests in chart/crds and then post-installs chart/templates to avoid the error. Ref: https://helm.sh/docs/topics/charts_hooks/#hooks-and-the-release-lifecycle

However, the helm3 cannot template the manifests in chart/crds. It leads to the namespace in our crds.yaml cannot be a variable. Similar issues:

crd does not render helm/helm#9885

Helm3 does not do templating of CRDs helm/helm#7083

It seems that the Setting CR solution is still blocked by https://github.com/longhorn/longhorn/issues/1879#issuecomment-947332276

I am fine with rollbacking to use the ConfigMap for now.

shuo-wu on May 17, 2022

Validate on longhorn

master-head(702d728cba2837c6f7e2bb0ef19e688657cbb364)
v1.2.x-head (3bad923306beeb0c6ac88bf839a6efee5fb19713)
v1.1.x-head (aebd0b10868af2f20bcf45c7f54a00a464221055)

Test steps Result Pass

Test 1. Fresh installation Check default setting value working properly at each step

1. Install Longhorn master-head with customized-default-settings.yaml
2. Update default settings using kubctl
3. Upgrade longhorn-manager without a new customized-default-settings.yaml
4. Upgrade longhorn-manager with a new customized-default-settings.yaml
5. Delete longhorn-manager and wait for restart

Test 2.1 Upgrade from v1.1.x/v1.2.x to master-head and without a new customized-default-settings.yaml Check default setting value working properly at each step

1. Install Longhorn v1.1.x/v1.2.x with customized-default-settings.yaml
2. Upgrade to master-head
3. Upgrade longhorn-manager without a new customized-default-settings.yaml
4. Upgrade longhorn-manager with a new customized-default-settings.yaml
5. Delete longhorn-manager and wait for restart

Test 3 Upgrade from v1.1.x/v1.2.x to master-head and with a new customized-default-settings.yaml Check default setting value working properly at each step

1. Install Longhorn v1.1.x/v1.2.x with customized-default-settings.yaml
2. Upgrade to master-head with new-customized-default-settings.yaml
3. Update default settings using kubectl
4. Delete longhorn-manager and wait for restart.

cc @innobead , @derekbit

chriscchien on May 16, 2022

Three manual test cases.

Test 1. Fresh installation

1. Create customized-default-settings.yaml

# cat customized-default-settings.yaml
defaultSettings:
  backupTarget: s3://backupbucket@us-east-1/backupstore
  backupTargetCredentialSecret: minio-secret
  storageOverProvisioningPercentage: 300
  defaultReplicaCount: 5
  guaranteedEngineManagerCPU: 15
  guaranteedReplicaManagerCPU: 15

2. Install Longhorn with customized-default-settings.yaml

# helm install longhorn ./ --values customized-default-settings.yaml  -n longhorn-system --create-namespace

Then, check the default settings.

# kl get settings backup-target // s3://backupbucket@us-east-1/backupstore
# kl get settings backup-target-credential-secret // minio-secret
# kl get settings storage-over-provisioning-percentage // 300
# kl get settings default-replica-count // 5
# kl get settings guaranteed-engine-manager-cpu // 15
# kl get settings guaranteed-replica-manager-cpu // 15

3. Update default settings using kubectl

# kl edit settings backing-image-recovery-wait-interval
// update to 400
# kl get settings backing-image-recovery-wait-interval // 400

4. Upgrade (Longhorn version is not changed, and longhorn-manager tag is changed)

4.1 Without a new customized-default-settings.yaml

// Longhorn version is not changed, and longhorn-manager tag is changed
# helm upgrade longhorn ./ -n longhorn-system

The default settings should not be modified.

# kl get settings backup-target // s3://backupbucket@us-east-1/backupstore
# kl get settings backup-target-credential-secret // minio-secret
# kl get settings storage-over-provisioning-percentage // 300
# kl get settings default-replica-count // 5
# kl get settings guaranteed-engine-manager-cpu // 15
# kl get settings guaranteed-replica-manager-cpu // 15
# kl get settings backing-image-recovery-wait-interval // 400

4.2 With a new customized-default-settings.yaml

# cat new-customized-default-settings.yaml
defaultSettings:
  defaultReplicaCount: 11
  guaranteedEngineManagerCPU: 18
	storageMinimalAvailablePercentage: 30

// Longhorn version is not changed, and longhorn-manager tag is changed
# helm upgrade longhorn ./ --values new-customized-default-settings.yaml -n longhorn-system

Then, check if the default settings are updated as expected.

# kl get settings backup-target // s3://backupbucket@us-east-1/backupstore
# kl get settings backup-target-credential-secret // minio-secret
# kl get settings storage-over-provisioning-percentage // 300
# kl get settings default-replica-count // 11
# kl get settings guaranteed-engine-manager-cpu // 18
# kl get settings guaranteed-replica-manager-cpu // 15
# kl get settings storage-minimal-available-percentage // 30
# kl get settings backing-image-recovery-wait-interval // 400

Update default settings using kubectl

# kl edit settings backing-image-recovery-wait-interval
// Update to 250
# kl get settings backing-image-recovery-wait-interval // 250

# kl edit settings guaranteed-engine-manager-cpu
// Update to 17
# kl get settings guaranteed-engine-manager-cpu // 17

5. Delete longhorn-manager and wait for restart.

Check the default settings that should not be modified.

# kl get settings backup-target // s3://backupbucket@us-east-1/backupstore
# kl get settings backup-target-credential-secret // minio-secret
# kl get settings storage-over-provisioning-percentage // 300
# kl get settings default-replica-count // 11
# kl get settings guaranteed-engine-manager-cpu // 17
# kl get settings guaranteed-replica-manager-cpu // 15
# kl get settings storage-minimal-available-percentage // 30
# kl get settings backing-image-recovery-wait-interval // 250

6. Upgrade (Longhorn version is not changed, and longhorn-manager tag is not changed)

Update default settings using kubectl

# kl edit settings concurrent-replica-rebuild-per-node-limit
// Update to 6
# kl get settings concurrent-replica-rebuild-per-node-limit // 6

With a new-customized-default-settings2.yaml

# cat new-customized-default-settings2.yaml
defaultSettings:
  defaultReplicaCount: 13
  backupstorePollInterval: 299
	storageMinimalAvailablePercentage: 17
  concurrentReplicaRebuildPerNodeLimit: 7

// Longhorn version is not changed, and longhorn-manager tag is ***not*** changed
# helm upgrade longhorn ./ --values new-customized-default-settings2.yaml -n longhorn-system

Check the default settings

# kl get settings backup-target // s3://backupbucket@us-east-1/backupstore
# kl get settings backup-target-credential-secret // minio-secret
# kl get settings storage-over-provisioning-percentage // 300
# kl get settings default-replica-count // 13
# kl get settings guaranteed-engine-manager-cpu // 17
# kl get settings guaranteed-replica-manager-cpu // 15
# kl get settings storage-minimal-available-percentage // 17
# kl get settings backing-image-recovery-wait-interval // 250
# kl get settings backupstore-poll-interval // 299
# kl get settings concurrent-replica-rebuild-per-node-limit // 7

With a new-customized-default-settings3.yaml

# cat new-customized-default-settings3.yaml
defaultSettings:
  backupTarget: ~
  backupTargetCredentialSecret: ""

// Longhorn version is not changed, and longhorn-manager tag is ***not*** changed
# helm upgrade longhorn ./ --values new-customized-default-settings3.yaml -n longhorn-system

Check the default settings

# kl get settings backup-target // s3://backupbucket@us-east-1/backupstore
# kl get settings backup-target-credential-secret // ""
# kl get settings storage-over-provisioning-percentage // 300
# kl get settings default-replica-count // 13
# kl get settings guaranteed-engine-manager-cpu // 17
# kl get settings guaranteed-replica-manager-cpu // 15
# kl get settings storage-minimal-available-percentage // 17
# kl get settings backing-image-recovery-wait-interval // 250
# kl get settings backupstore-poll-interval // 299
# kl get settings concurrent-replica-rebuild-per-node-limit // 7

Test 2. Upgrade from v1.1.x or v1.2.x to master-head and without a new customized-default-settings.yaml

1. Install Longhorn v1.1.x or v1.2.x with customized-default-settings.yaml

# helm install longhorn ./ --values customized-default-settings.yaml  -n longhorn-system --create-namespace

Then, check the default settings.

# kl get settings backup-target // s3://backupbucket@us-east-1/backupstore
# kl get settings backup-target-credential-secret // minio-secret
# kl get settings storage-over-provisioning-percentage // 300
# kl get settings default-replica-count // 5
# kl get settings guaranteed-engine-manager-cpu // 15
# kl get settings guaranteed-replica-manager-cpu // 15

2. Upgrade to master-head

# helm upgrade longhorn ./ -n longhorn-system

The default settings should not be modified.

Then, follow the steps from Test 1. Step3-Step6

Test 3. Upgrade from v1.1.x or v1.2.x to master-head and with a new customized-default-settings.yaml

1. Install Longhorn v1.1.x or v1.2.x with customized-default-settings.yaml

# helm install longhorn ./ --values customized-default-settings.yaml  -n longhorn-system --create-namespace

Then, check the default settings.

# kl get settings backup-target // s3://backupbucket@us-east-1/backupstore
# kl get settings backup-target-credential-secret // minio-secret
# kl get settings storage-over-provisioning-percentage // 300
# kl get settings default-replica-count // 5
# kl get settings guaranteed-engine-manager-cpu // 15
# kl get settings guaranteed-replica-manager-cpu // 15

2. Upgrade to master-head

# cat new-customized-default-settings.yaml
defaultSettings:
  defaultReplicaCount: 11
  guaranteedEngineManagerCPU: 18
	storageMinimalAvailablePercentage: 30

# helm upgrade longhorn ./ --values new-customized-default-settings.yaml -n longhorn-system

Then, check if the default settings are updated as expected.

# kl get settings backup-target // s3://backupbucket@us-east-1/backupstore
# kl get settings backup-target-credential-secret // minio-secret
# kl get settings storage-over-provisioning-percentage // 300
# kl get settings default-replica-count // 11
# kl get settings guaranteed-engine-manager-cpu // 18
# kl get settings guaranteed-replica-manager-cpu // 15
# kl get settings storage-minimal-available-percentage // 30

3. Update default settings using kubectl

# kl edit settings backing-image-recovery-wait-interval
// Update to 250
# kl get settings backing-image-recovery-wait-interval // 250

# kl edit settings guaranteed-engine-manager-cpu
// Update to 17
# kl get settings guaranteed-engine-manager-cpu // 17

4. Delete longhorn-manager and wait for restart.

Check the default settings that should not be modified.

# kl get settings backup-target // s3://backupbucket@us-east-1/backupstore
# kl get settings backup-target-credential-secret // minio-secret
# kl get settings storage-over-provisioning-percentage // 300
# kl get settings default-replica-count // 11
# kl get settings guaranteed-engine-manager-cpu // 17
# kl get settings guaranteed-replica-manager-cpu // 15
# kl get settings storage-minimal-available-percentage // 30
# kl get settings backing-image-recovery-wait-interval // 250

derekbit on May 18, 2022

~~The design is~~

~~In the upgrade path and before CreateOrUpdateLonghornVersionSetting, do updateCustomizedDefaultSetting.~~ ~~ ~~In updateCustomizedDefaultSetting~~ ~~- check if the LH version will be updated~~ ~~ - If no, don’t update any default settings and exit directly~~ ~- - If yes, read the customized default settings from latest ConfigMap user provided~~ ~~ - update the settings to latest values~~

Just saw the test in https://github.com/longhorn/longhorn/issues/2570#issuecomment-1004601106. The Longhorn version is not enough, so I change the design. The design is

Introduce ConfigMap resourceVersion into Setting
Record the resourceVersion of the ConfigMap, longhorn-default-setting, in the Setting CR if the value is from ConfigMap.
During upgrade or longhorn-manager restart, if the resourceVersion of the ConfigMap is different than the one in the Setting CR, update the Setting CR using the latest value in the ConfigMap.

derekbit on May 15, 2022

It is because the Setting CRs are applied before the Setting CRD is fully registered. Similar issues are reported from other projects, e.g. cert-manager/cert-manager#3377

☝️ just saw this part as you said.

If this possibly happens, then we should revert these changes. @derekbit well done to find out this corner!

The way you proposed is a way to do it, but the helm chart is not just for install or upgrade, it is probably used by users to update the existing manifests instead of only upgrading to a new Longhorn version, so a version flag to determine if need to update settings probably will not work completely?

Also, probably need to consider the consistency between the config map and settings via controllers.

innobead on May 13, 2022

Helm cannot guarantee the setting CR create/update happens after the setting CRD is registered although post-install hook on the setting CR is applied.

derekbit on May 13, 2022

#2570 (comment)

No, there is a probability of hitting the error after applying the solution.

# helm install longhorn ./ --set defaultSettings.defaultReplicaCount=11 -n longhorn-system --create-namespace
W0513 05:22:43.040566   26654 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
W0513 05:22:43.102278   26654 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
Error: INSTALLATION FAILED: failed post-install: unable to build kubernetes object for deleting hook longhorn/templates/customize-default-setting.yaml: unable to recognize "": no matches for kind "Setting" in version "longhorn.io/v1beta2"

derekbit on May 13, 2022

Validation - PASSED

This test result only applies for master-head version. Due to an API version issue for the moment, upgrading from v1.2.2 or 1.1.3 will be tested in backport issues.

With the fix, Longhorn will now respect the customized values. Also, when upgrading without additional customization the setting values will be carry over.

Detail steps:

Create a fresh cluster k8s version > v1.18
Install Longhorn with helm from branch v1.3.0 (modified image tag in values.yaml to master-head):
- $ helm install longhorn ./longhorn/chart --set defaultSettings.defaultReplicaCount=11 -n longhorn-system
- Check the setting form UI or commandline:
  - $ kubectl get setting/default-replica-count -n longhorn-system
Upgrade the longhorn via helm with different setting values:
- helm upgrade longhorn ./longhorn/chart --set defaultSettings.defaultReplicaCount=7 -n longhorn-system
Check the settings again

Additional test for deleting setting:

Delete config value via commandline:
- $ kubectl delete setting/default-replica-count -n longhorn-system
Check the UI it will be fallback to default value

kaxing on Jan 4, 2022

After the team review meeting, we could revisit it again after the longhorn bumped the CRD version to v1beta2, to see if it is doable or not.

jenting on Nov 30, 2021