rook: rook-ceph upgrade fails on kubernetes 1.25 with 1.9.10

Is this a bug report or feature request?

  • Bug Report

Deviation from expected behavior:

Upgrade won’t succeed

Expected behavior:

Upgrade will succeed

How to reproduce it (minimal and precise):

  1. With kubernetes cluster at version 1.24
  2. Install rook-ceph 1.9.9
  3. Upgrade kubernetes to 1.25
  4. Attempt upgrading rook-ceph to 1.9.10

Logs to submit:

helm upgrade -i -n rook-ceph --repo https://charts.rook.io/release --values rook-ceph.yaml --version 1.9.10 rook-ceph rook-ceph
Error: UPGRADE FAILED: unable to build kubernetes objects from current release manifest: resource mapping not found for name: "00-rook-privileged" namespace: "" from "": no matches for kind "PodSecurityPolicy" in version "policy/v1beta1"

Environment:

  • OS (e.g. from /etc/os-release): Ubuntu 22.04
  • Kernel (e.g. uname -a): Linux pi01 5.15.0-1013-raspi #15-Ubuntu SMP PREEMPT Mon Aug 8 06:33:06 UTC 2022 aarch64 aarch64 aarch64 GNU/Linux
  • Cloud provider or hardware configuration: Raspberry PI 4
  • Rook version (use rook version inside of a Rook Pod): 1.9.9
  • Storage backend version (e.g. for ceph do ceph -v): 16.2.0
  • Kubernetes version (use kubectl version): 1.25.0
  • Kubernetes cluster type (e.g. Tectonic, GKE, OpenShift): kubeadm

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 3
  • Comments: 23 (12 by maintainers)

Most upvoted comments

I just tried it and had no issues:

# First delete all sh.helm.release.* secrets
kubectl -n rook-ceph get secret | grep -i 'sh.helm.release' | awk '{print $1}' | xargs kubectl -n rook-ceph delete secret

# Next reinstall the helm chart
helm  upgrade --install --create-namespace --namespace rook-ceph rook-ceph rook-release/rook-ceph -f values.yaml

The thing to remember is that the helm chart only installs the rook operator – it doesn’t install or mess with your cluster. (at least the way I did it). Your cluster resource is what controls that, so as long as that doesn’t change this will just redeploy the operator and it will see the existing cluster, check that everything is good, and life will go on.

Several people have now reported that they were able to remove the helm secrets and then re-install with helm and everything worked fine; see https://github.com/helm/helm/issues/11287 for examples. I haven’t tried it yet myself.

Did you set the operatorNamespace property in the cluster chart? The overlapping resources shouldn’t be created by the cluster chart if the operator namespace is the same as the cluster namespace. For example, see how the resources are skipped here depending on the operatorNamespace.

So we had the same issue (automatic upgrade by cloud provider forced our hand before we were ready) and attempted to remove the helm secret with

    kubectl -n rook-ceph get secret | grep -i 'sh.helm.release' | awk '{print $1}' | xargs kubectl -n rook-ceph delete secret

As documented this does remove the secret and helm then attempts the install but it failed with:

   Release "rook-ceph" does not exist. Installing it now.
   Error: rendered manifests contain a resource that already exists. Unable to continue with install: ServiceAccount "rook-ceph-osd" in namespace "rook-ceph" exists and cannot be imported into the current release: invalid ownership metadata; label validation error: missing key "app.kubernetes.io/managed-by": must be set to "Helm"; annotation validation error: missing key "meta.helm.sh/release-name": must be set to "rook-ceph"; annotation validation error: missing key "meta.helm.sh/release-namespace": must be set to "rook-ceph"

In the end CustomResourceDefinition,ServiceAccount,ClusterRole,ClusterRoleBinding,Role,RoleBinding and others did not have the right labels so helm wouldn’t accept responsibility for them.

I use a script to re-label all the resources

#!/bin/bash

set -e
#set -o pipefail
set -o xtrace
HELM_RELEASE='rook-ceph'
HELM_RELEASE_NAMESPACE='rook-ceph'

kubectl -n rook-ceph get secret | grep -i 'sh.helm.release' | awk '{print $1}' | xargs kubectl -n rook-ceph delete secret


for CRD in $(kubectl get all -n rook-ceph -o=name)
do
    kubectl -n $HELM_RELEASE_NAMESPACE label ${CRD} app.kubernetes.io/managed-by=Helm --overwrite
    kubectl -n $HELM_RELEASE_NAMESPACE annotate ${CRD} meta.helm.sh/release-name=${HELM_RELEASE} --overwrite
    kubectl -n $HELM_RELEASE_NAMESPACE annotate ${CRD} meta.helm.sh/release-namespace=${HELM_RELEASE_NAMESPACE} --overwrite
done
#Note that there are cluster wide resources here and we don't filter our rook specific resources for those
#Above doesn't patch
for CRD in $(kubectl  -n rook-ceph get CustomResourceDefinition,ServiceAccount,ClusterRole,ClusterRoleBinding,Role,RoleBinding -o=name)
do
    kubectl -n $HELM_RELEASE_NAMESPACE label ${CRD} app.kubernetes.io/managed-by=Helm --overwrite
    kubectl -n $HELM_RELEASE_NAMESPACE annotate ${CRD} meta.helm.sh/release-name=${HELM_RELEASE} --overwrite
    kubectl -n $HELM_RELEASE_NAMESPACE annotate ${CRD} meta.helm.sh/release-namespace=${HELM_RELEASE_NAMESPACE} --overwrite
done

Pay attention to the note given - there are cluster resources unrelated to rook that get the rook labels added by this script which may not be acceptable for you so change it to suit but otherwise it works as is. Following this the helm upgrade --install command given above works.

Original script was from https://github.com/helm/helm/issues/7418#issuecomment-946802807