cluster-api: Unable to delete a cluster when infrastructureRef is defined incorrectly
What steps did you take and what happened: Creating a cluster definition with the incorrect infrastructureSpec results in a cluster resource that can’t be deleted, also exhibiting the same behaviour on the namespace where it exists.
Example Cluster:
apiVersion: cluster.x-k8s.io/v1alpha2
kind: Cluster
metadata:
name: blank
namespace: blank
spec:
clusterNetwork:
services:
cidrBlocks: ["10.96.0.0/12"]
pods:
cidrBlocks: ["192.168.0.0/16"]
serviceDomain: "cluster.local"
apiServerPort: 6433
infrastructureRef:
apiVersion: blank.cluster.k8s.io/v1alpha1
kind: blankCluster
name: blankTest
namespace: blank
Applying this will create a new cluster resource in the namespace blank
as expected:
k create namespace blank; k create -f ./blank.yaml
What did you expect to happen:
That deleting this erroneous cluster resource or it’s namespace, that it would be cleaned up from the cluster. However at this point it will hang indefinitely (even with force):
k get cluster -n blank
NAME PHASE
blank provisioning
k delete cluster blank -n blank
cluster.cluster.x-k8s.io "blank" deleted
<hang>
Anything else you would like to add:
As pointed out by @detiber, editing the resource and removing the infrastructureRef
that it will be removed as expected:
k edit cluster blank -n blank
cluster.cluster.x-k8s.io/blank edited
k delete cluster blank -n blank
Error from server (NotFound): clusters.cluster.x-k8s.io "blank" not found
Environment:
- Cluster-api version: 0.2.5
- Minikube/KIND version: N/A (vanilla deployment on VMs)
- Kubernetes version: (use
kubectl version
): 1.14.1 - OS (e.g. from
/etc/os-release
): Ubuntu 18.04
/kind bug
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 24 (23 by maintainers)
@prankul88 I do not believe we’ll be able to implement what you’ve described. When you issue
kubectl delete
, it has a--wait
flag that defaults to true. If a resource has finalizers, the apiserver sets the deletion timestamp on the resource, and the--wait=true
flag causes kubectl to wait for the resource’s finalizers to be removed, and for the resource ultimately to be removed from etcd. If there is still a finalizer on the resource, which is what happens in case 1, there is nothingkubectl
can do in its current form to give you any additional information as to what is going on. If youctrl-c
thekubectl delete
call, the resource still has its deletion timestamp set, and the apiserver is still waiting for all the finalizers to be removed. This is the standard behavior for all Kubernetes resources, both built-in types and custom resources, and there is no way to alter the behavior of either the apiserver or kubectl without making changes to Kubernetes.I think it may be sufficient to modify
ClusterReconciler.reconcileDelete()
to have it skip over 404 not found errors here:https://github.com/kubernetes-sigs/cluster-api/blob/065eb539766dede097e206a7b549b5902d15f14a/controllers/cluster_controller.go#L256
Hello,
I raised the issue mainly as it is certainly confusing behaviour for end-users that don’t know where to look or even that they will need to start manually editing various object
spec
. It’s more of a UX issue if the end-user can’t be notified that the delete operation is failing due to a mis-aligned reference I suppose.@wfernandes Yes I am working on it.
/assign /lifecycle active
@thebsdbox I am facing the same issue. Will work on it.