rancher: [Backport v2.6] [BUG] Deleted cluster not removed from UI view until hard refresh (Ghost cluster)
Rancher 2.6.1
- Login as standard, non-admin user
- Create a RKE2 cluster
- In a second browser login as admin user and navigate to Cluster Manager view
- As the non-admin user delete the cluster
Expected:
When the cluster is deleted it is removed from the view.
Actual:
Normal user session: Deleted cluster is not removed from the view until performing a hard browser refresh. Admin user session: Deleted cluster is removed from the view immediately
https://user-images.githubusercontent.com/3813921/137932366-12f9f499-0e30-4c17-86b5-b361ba95c5a5.mov
2021/10/19 15:03:11 [INFO] rkecluster fleet-default/template-rke2: waiting for at least one bootstrap node
2021/10/19 15:03:11 [INFO] [mgmt-auth-crtb-controller] Deleting roleBinding crb-n4zbcfymqv
2021/10/19 15:03:11 [INFO] [mgmt-auth-crtb-controller] Deleting rolebinding creator-cluster-owner-cluster-owner in namespace p-l9xs6 for crtb creator-cluster-owner
2021/10/19 15:03:11 [INFO] [mgmt-auth-crtb-controller] Deleting rolebinding creator-cluster-owner-cluster-owner in namespace p-hhc6q for crtb creator-cluster-owner
2021/10/19 15:03:12 [INFO] [mgmt-project-rbac-remove] Deleting namespace p-l9xs6
2021/10/19 15:03:12 [INFO] [mgmt-project-rbac-remove] Deleting namespace p-hhc6q
2021/10/19 15:03:16 [ERROR] error syncing 'c-m-v8p56lnd': handler cluster-watch: clusterregistrationtokens.management.cattle.io "default-token" is forbidden: unable to create new content in namespace c-m-v8p56lnd because it is being terminated, requeuing
2021/10/19 15:03:16 [ERROR] error syncing 'c-m-v8p56lnd': handler cluster-watch: clusterregistrationtokens.management.cattle.io "default-token" is forbidden: unable to create new content in namespace c-m-v8p56lnd because it is being terminated, requeuing
2021/10/19 15:03:16 [ERROR] error syncing 'c-m-v8p56lnd': handler cluster-watch: clusterregistrationtokens.management.cattle.io "default-token" is forbidden: unable to create new content in namespace c-m-v8p56lnd because it is being terminated, requeuing
2021/10/19 15:03:16 [ERROR] error syncing 'c-m-v8p56lnd': handler cluster-watch: clusterregistrationtokens.management.cattle.io "default-token" is forbidden: unable to create new content in namespace c-m-v8p56lnd because it is being terminated, requeuing
2021/10/19 15:03:16 [ERROR] error syncing 'c-m-v8p56lnd': handler cluster-watch: clusterregistrationtokens.management.cattle.io "default-token" is forbidden: unable to create new content in namespace c-m-v8p56lnd because it is being terminated, requeuing
2021/10/19 15:03:16 [ERROR] error syncing 'c-m-v8p56lnd': handler cluster-watch: clusterregistrationtokens.management.cattle.io "default-token" is forbidden: unable to create new content in namespace c-m-v8p56lnd because it is being terminated, requeuing
2021/10/19 15:03:17 [ERROR] error syncing 'fleet-default/template-rke2-dsfsdfsdfsdf-7ccf7c9f-98fgb': handler machine-provision-remove: cannot delete machine template-rke2-dsfsdfsdfsdf-7ccf7c9f-98fgb because create job has not finished, requeuing
2021/10/19 15:03:17 [INFO] rkecluster fleet-default/template-rke2: waiting for at least one bootstrap node
2021/10/19 15:03:17 [ERROR] error syncing 'c-m-v8p56lnd': handler cluster-watch: clusterregistrationtokens.management.cattle.io "default-token" is forbidden: unable to create new content in namespace c-m-v8p56lnd because it is being terminated, requeuing
2021/10/19 15:03:17 [INFO] [mgmt-auth-prtb-controller] Updating owner label for roleBinding crb-mrk3zsur4y
2021/10/19 15:03:17 [ERROR] error syncing 'c-m-v8p56lnd': handler cluster-watch: clusterregistrationtokens.management.cattle.io "default-token" is forbidden: unable to create new content in namespace c-m-v8p56lnd because it is being terminated, requeuing
2021/10/19 15:03:18 [ERROR] error syncing 'c-m-v8p56lnd': handler cluster-watch: clusterregistrationtokens.management.cattle.io "default-token" is forbidden: unable to create new content in namespace c-m-v8p56lnd because it is being terminated, requeuing
2021/10/19 15:03:18 [INFO] [mgmt-auth-prtb-controller] Deleting roleBinding crb-mrk3zsur4y
2021/10/19 15:03:19 [INFO] [mgmt-cluster-rbac-delete] Creating namespace c-m-v8p56lnd
2021/10/19 15:03:19 [ERROR] error syncing 'c-m-v8p56lnd': handler cluster-watch: namespaces "c-m-v8p56lnd" not found, requeuing
2021/10/19 15:03:20 [ERROR] error syncing 'harvey': handler machine-worker-label: machines.cluster.x-k8s.io "custom-599f50dd418f" not found, requeuing
```
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Reactions: 1
- Comments: 27 (23 by maintainers)
Validation Template
This template is for the second version of the fix. Information here supersedes what was in the the previous validation template.
Root Cause
Users are assigned to particular roles/roleBindings (for provisioning clusters) and clusterRoles/clusterRoleBindings(for management clusters) which give them permission to view the cluster object in the local cluster. These roles are owned by the cluster object (in part) and are deleted when the cluster is deleted. In the current version of rancher, this results in these objects being removed before the cluster is completely gone. This causes users to not receive (through the websocket) the delete events for the cluster, which results in the cluster remaining on the UI. Admin users, which have
*
verbs on*
resources in*
groups in the local cluster, do not suffer from these issues, resulting in the discrepancy pointed out in the issue.The root cause of the previous fix being insufficient for RKE2 clusters was that v2 provisioning clusters use a different CRD as their primary cluster object (
clusters.provisioning.cattle.io
). These cluster types were not checked and the roles which gave access to these still experienced the issues outlined above.What was fixed, or what change have occurred
Logic has been added which blocks roles, role bindings, cluster roles, and cluster rolebindings which grant access to a cluster from being deleted as long as the cluster is still in a deleting state (i.e. can be retrieved from k8s and has a non-nil deletion timestamp).
The previous version of this change only considered mgmt type clusters (
clusters.management.cattle.io
). This new change considers provisioning type clusters (clusters.provisioning.cattle.io
) and blocks deletion of the RBAC objects outlined above so long as clusters of either type are in a deleting state.Areas or cases that should be tested
Notes
Test Cases
cluster-owner
,cluster-member
,view nodes
, etc. and project level likeproject owner
,manage project members
, etc.). General Steps:standard user
global rolestandard user
global rolestandard user
global roleWhat areas could experience regressions
The ability to remove users from cluster level permissions. Since this code results in new finalizers on our RBAC objects, this could result in rolebindings/clusterrolebindings not being deleted. This could result in more permissions for users than was desired. Note that since this effects the RBAC primitives, not our ClusterRoleTemplateBindings or ProjectRoleTemplateBindings, you will need to verify that the RBAC primitives are deleted. It is not enough to verify that the CRTB or PRTB is deleted. In addition, keep in mind that some permissions are granted to users which have roles in a project in a cluster.
The ability to delete RBAC objects in the local cluster. This is related to the first bullet point, but lower level. The way that the solution is structured results in a new finalizer on every Role, Cluster Role, Role Binding, or Cluster Role Binding in the local cluster. The code was configured, essentially, to ignore resources which do not have a
cluster.cattle.io/name
annotation (and acluster.cattle.io/namespace
in the case of some RBAC resources). But it’s possible that there exists a bug which causes the finalizer to be overly active, so tests should be conducted to ensure basic create/delete of the above resources continues to work without issue in the local cluster.Are the repro steps accurate/minimal?
Yes, they are included here for convenience.
standard user
global role