rook: rook-ceph stuck in Terminating state : unable to delete

Is this a bug report or feature request?

  • Bug Report

Deviation from expected behavior:

karasing-OSX-2:~/ocs4.0/rook$ oc get scc | grep -i ceph
rook-ceph          true      []        MustRunAs   RunAsAny           MustRunAs   RunAsAny    <none>     false            [configMap downwardAPI emptyDir flexVolume hostPath persistentVolumeClaim projected secret]
karasing-OSX-2:~/ocs4.0/rook$
karasing-OSX-2:~/ocs4.0/rook$ oc get ns | grep -i ceph
rook-ceph-system                             Terminating   6d11h
karasing-OSX-2:~/ocs4.0/rook$
karasing-OSX-2:~/ocs4.0/rook$ oc get all -n rook-ceph-system
No resources found.
karasing-OSX-2:~/ocs4.0/rook$
karasing-OSX-2:~/ocs4.0/rook$ oc get namespace rook-ceph-system -o json
{
    "apiVersion": "v1",
    "kind": "Namespace",
    "metadata": {
        "annotations": {
            "kubectl.kubernetes.io/last-applied-configuration": "{\"apiVersion\":\"v1\",\"kind\":\"Namespace\",\"metadata\":{\"annotations\":{},\"name\":\"rook-ceph-system\",\"namespace\":\"\"}}\n",
            "openshift.io/sa.scc.mcs": "s0:c21,c10",
            "openshift.io/sa.scc.supplemental-groups": "1000440000/10000",
            "openshift.io/sa.scc.uid-range": "1000440000/10000"
        },
        "creationTimestamp": "2019-02-09T13:37:33Z",
        "deletionTimestamp": "2019-02-09T14:35:59Z",
        "name": "rook-ceph-system",
        "resourceVersion": "192694",
        "selfLink": "/api/v1/namespaces/rook-ceph-system",
        "uid": "da82f47a-2c6f-11e9-8392-12645b59c006"
    },
    "spec": {
        "finalizers": [
            "kubernetes"  <===== This is the problem
        ]
    },
    "status": {
        "phase": "Terminating"
    }
}
karasing-OSX-2:~/ocs4.0/rook$

I have also gone throuth the comments https://github.com/rook/rook/issues/1488#issuecomment-397336663 however i do not have resources to run the patch command

karasing-OSX-2:~/ocs4.0/rook$ oc delete ns rook-ceph-system --grace-period=0 --force
warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
Error from server (Conflict): Operation cannot be fulfilled on namespaces "rook-ceph-system": The system is ensuring all content is removed from this namespace.  Upon completion, this namespace will automatically be purged by the system.
karasing-OSX-2:~/ocs4.0/rook$
karasing-OSX-2:~/ocs4.0/rook$ oc get crd | egrep -i "ceph|rook|system"
karasing-OSX-2:~/ocs4.0/rook$

Expected behavior:

Should be able to delete rook-ceph-system namespace

How to reproduce it (minimal and precise):

  • Get OCP 4.0 on AWS
  • oc create -f scc.yaml
  • oc create -f operator.yaml
  • Try to delete/purge [ without running cluster.yaml ]

Environment:

  • OS (e.g. from /etc/os-release): RHCOS
  • Kernel (e.g. uname -a):
  • Cloud provider or hardware configuration:
  • Rook version (use rook version inside of a Rook Pod): rook/ceph:master
  • Kubernetes version (use kubectl version):
  • Kubernetes cluster type (e.g. Tectonic, GKE, OpenShift): Openshift 4.0
  • Storage backend status (e.g. for Ceph use ceph health in the Rook Ceph toolbox):

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 16 (8 by maintainers)

Most upvoted comments

kubectl -n rook-ceph patch cephclusters.ceph.rook.io rook-ceph -p ‘{“metadata”:{“finalizers”: []}}’ --type=merge

This usually happens when the operator is deleted before the cluster CR. The doc already has a mention of that: https://rook.io/docs/rook/v1.0/ceph-teardown.html

@travisn you might be correct. kubernetes finalizer must be added by openshift, but i don’t know how it did that. I just ran scc.yaml and operator.yaml files and after that i tried to delete the namespace.

Anyways, can you help me how to delete that manually. As of now i tried the following steps

oc get ns rook-ceph-system -o json > todelete.json
karasing-OSX-2:~/ocs4.0/rook$ cat todelete.json
{
    "apiVersion": "v1",
    "kind": "Namespace",
    "metadata": {
        "annotations": {
            "kubectl.kubernetes.io/last-applied-configuration": "{\"apiVersion\":\"v1\",\"kind\":\"Namespace\",\"metadata\":{\"annotations\":{},\"name\":\"rook-ceph-system\",\"namespace\":\"\"}}\n",
            "openshift.io/sa.scc.mcs": "s0:c21,c10",
            "openshift.io/sa.scc.supplemental-groups": "1000440000/10000",
            "openshift.io/sa.scc.uid-range": "1000440000/10000"
        },
        "creationTimestamp": "2019-02-09T13:37:33Z",
        "deletionTimestamp": "2019-02-09T14:35:59Z",
        "name": "rook-ceph-system",
        "resourceVersion": "192694",
        "selfLink": "/api/v1/namespaces/rook-ceph-system",
        "uid": "da82f47a-2c6f-11e9-8392-12645b59c006"
    },
    "spec": {
        "finalizers": [
            "kubernetes"  <--- Removed this line
        ]
    },
    "status": {
        "phase": "Terminating"
    }
}
karasing-OSX-2:~/ocs4.0/rook$
  • Removed kubernetes finalizer
karasing-OSX-2:~/ocs4.0/rook$ cat todelete.json
{
    "apiVersion": "v1",
    "kind": "Namespace",
    "metadata": {
        "annotations": {
            "kubectl.kubernetes.io/last-applied-configuration": "{\"apiVersion\":\"v1\",\"kind\":\"Namespace\",\"metadata\":{\"annotations\":{},\"name\":\"rook-ceph-system\",\"namespace\":\"\"}}\n",
            "openshift.io/sa.scc.mcs": "s0:c21,c10",
            "openshift.io/sa.scc.supplemental-groups": "1000440000/10000",
            "openshift.io/sa.scc.uid-range": "1000440000/10000"
        },
        "creationTimestamp": "2019-02-09T13:37:33Z",
        "deletionTimestamp": "2019-02-09T14:35:59Z",
        "name": "rook-ceph-system",
        "resourceVersion": "192694",
        "selfLink": "/api/v1/namespaces/rook-ceph-system",
        "uid": "da82f47a-2c6f-11e9-8392-12645b59c006"
    },
    "spec": {
        "finalizers": [
        ]
    },
    "status": {
        "phase": "Terminating"
    }
}
  • Login with system:admin
karasing-OSX-2:~/ocs4.0/rook$ oc whoami
system:admin
karasing-OSX-2:~/ocs4.0/rook$
karasing-OSX-2:~/ocs4.0/rook$ curl -k -H "Content-Type: application/json" -X PUT --data-binary @todelete.json https://ocp4-api.ceph-s3.com:6443/api/v1/namespaces/rook-ceph-system/finalize
{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {

  },
  "status": "Failure",
  "message": "namespaces \"rook-ceph-system\" is forbidden: User \"system:anonymous\" cannot update resource \"namespaces/finalize\" in API group \"\" in the namespace \"rook-ceph-system\"",
  "reason": "Forbidden",
  "details": {
    "name": "rook-ceph-system",
    "kind": "namespaces"
  },
  "code": 403
}
karasing-OSX-2:~/ocs4.0/rook$ oc get ns | grep -i rook
rook-ceph-system                             Terminating   6d11h
karasing-OSX-2:~/ocs4.0/rook$

This is also a problem for me. I think this should be reopened.

I was able to cleanly stand up a ceph cluster. However I followed the rip down procedures, and I see several resources hanging:

  • cephcluster.ceph.rook.io “rook-ceph”
  • cephblockpool.ceph.rook.io “replicapool”
  • rook-ceph Terminating

this gave me a lot of info as to why namespaces weren’t automatically removed:

kubectl api-resources --verbs=list --namespaced -o name \
  | xargs -n 1 kubectl get --show-kind --ignore-not-found -n rook-ceph-system

in my case for example, it were the secrets and ingresses which I hadn’t cleaned up yet.

[nwatkins@smash kubensis]$ kubectl api-resources --verbs=list --namespaced -o name \
>   | xargs -n 1 kubectl get --show-kind --ignore-not-found -n rook-ceph
NAME                                 DATADIRHOSTPATH   MONCOUNT   AGE    STATE     HEALTH
cephcluster.ceph.rook.io/rook-ceph   /var/lib/rook     1          114m   Created   HEALTH_OK