rancher: [BUG] tigera-operator fails due to rancher-webhook denying access

Rancher Server Setup

  • Rancher version: 2.7.2
  • Installation option (Docker install/Helm Chart): docker

Information about the Cluster

  • Kubernetes version: v1.24.11 +k3s1
  • Cluster Type (Local/Downstream): Downstream custom

User Information

  • What is the role of the user logged in? Admin

Describe the bug

tigera-operator fails due to rancher-webhook denying access:

{"level":"info","ts":1681811352.3538423,"logger":"controller_installation","msg":"Failed to update object.","Name":"calico-system","Namespace":"","Kind":"Namespace","key":"/calico-system"}
Tue, Apr 18 2023 11:49:12 am
{"level":"error","ts":1681811352.3538892,"logger":"controller_installation","msg":"Error creating / updating resource","Request.Namespace":"tigera-operator","Request.Name":"tigera-ca-private","reason":"ResourceUpdateError","error":"admission webhook \"rancher.cattle.io.namespaces\" denied the request: Unauthorized","stacktrace":"github.com/tigera/operator/pkg/controller/status.(*statusManager).SetDegraded\n\t/go/src/github.com/tigera/operator/pkg/controller/status/status.go:406\ngithub.com/tigera/operator/pkg/controller/installation.(*ReconcileInstallation).Reconcile\n\t/go/src/github.com/tigera/operator/pkg/controller/installation/core_controller.go:1358\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.13.0/pkg/internal/controller/controller.go:121\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.13.0/pkg/internal/controller/controller.go:320\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.13.0/pkg/internal/controller/controller.go:273\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.13.0/pkg/internal/controller/controller.go:234"}
2023-04-18T11:49:12.354158625+02:00 {"level":"error","ts":1681811352.3540413,"msg":"Reconciler error","controller":"tigera-installation-controller","object":{"name":"tigera-ca-private","namespace":"tigera-operator"},"namespace":"tigera-operator","name":"tigera-ca-private","reconcileID":"593d8d9e-272b-4009-a6d1-823e0a39883d","error":"admission webhook \"rancher.cattle.io.namespaces\" denied the request: Unauthorized","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.13.0/pkg/internal/controller/controller.go:326\nsigs
.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.13.0/pkg/internal/controller/controller.go:273\nsigs.k8s.io/controller-runtime/pkg/internal/controller.

To Reproduce

Run cluster with tigera-operator for calico-installation and update: rancher from 2.7.1 -> 2.7.2 tigera-operator from 3.16 to 3.25

observe that the calico installation does not update

Result

Expected Result tigera-operator can update without problems.

Additional context related to #41172?

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Reactions: 4
  • Comments: 15 (2 by maintainers)

Commits related to this issue

Most upvoted comments

Hi, same issue here withe trident-operator and Rancher 2.7.3.

level=error msg="error syncing 'trident': reconcile failed; error re-installing Trident 'trident' ; err: reconcile failed; failed to patch Trident installation namespace trident; admission webhook \"rancher.cattle.io.namespaces\" denied the request: Unauthorized, requeuing"

Same helm installation with Rancher 2.7.0 worked fine, …

I’ve managed to resolve this by adding some new ClusterRoles and ClusterRoleBindings. For instance for Tigera:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: tigera-operator-psa
rules:
- apiGroups:
  - management.cattle.io
  resources:
  - projects
  verbs:
  - updatepsa
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: tigera-operator-psa
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: tigera-operator-psa
subjects:
- kind: ServiceAccount
  name: tigera-operator
  namespace: tigera-operator

This resolves the issue we were having with Tigera Operator, and a similar set of resources resolves a similar issue with the OPA Gatekeeper Update Namespace Label deployment.

I too had the issue where Trident reported 'Failed to install Trident; err: failed to patch Trident installation namespace trident; admission webhook "rancher.cattle.io.namespaces" denied the request: Unauthorized'.

I can confirm that upgrading to Rancher v2.7.5 have fixed the issue for me.

EDIT 1: No, upgrading to v2.7.5 did not fix it. 😞 I looked at the wrong cluster. Damn it.

EDIT 2: Alright, all is fine now! Big thanks to @justdan96 for posting the solution. In hindsight I should have tried that first, before all the other trouble I went through trying to debug this. 😌 Here’s my YAML for fixing Trident:

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: trident-operator-psa
rules:
- apiGroups:
  - management.cattle.io
  resources:
  - projects
  verbs:
  - updatepsa
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: trident-operator-psa
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: trident-operator-psa
subjects:
- kind: ServiceAccount
  name: trident-operator
  namespace: trident

This leaves me wondering…

  • Is this a bug in Rancher or a bug in the Trigera/Trident operators?
  • I cannot find any documentation for these RBAC rules. Is it really an undocumented feature of Rancher??
  • Is it reasonable that having patch privileges on the namespace resource is not enough for adding a label?
  • Is it reasonable for people to have to write special cases for each Kubernetes distribution?
  • At least in my case, the trident namespace is not part of any Rancher project. To me, it feels strange then that I need any privileges on the projects resource for things to work. Should that be changed in Rancher?

So, is this a bug in Rancher or a bug in the operators? I have to admit that it is a bit murky. On one hand, I generally expect Kubernetes manifests to work the same no matter if I deploy on Rancher, Minikube, AKS or vanilla Kubernetes. On the other hand, projects is a neat piece of management abstraction that Rancher provides, so that’s obviously something that needs to be handled when using Rancher.

If this should be added in the operator’s manifests in order to handle the “edge case of Rancher”, I assume it should be as simple as adding the relevant rule to the list of rules for the existing ClusterRole, right? Because it’s using the API group management.cattle.io, that should be unique and have no effect on other Kubernetes distributions. Am I correct in this assumption, or do we need some logic to determine which Kubernetes distribution we deploy to and have Helm render different manifests depending on that?

As far as I’m concerned, there is at least one thing that is 100% on Rancher’s responsibility and that they must fix: this API must be documented! And not just how to add PSA labels for namespaces, the whole management.cattle.io API group must be documented!

Getting the same issue with a fresh installation of the trident operator (NetApp). Recently updated to rancher 2.7.2 and also bumped the version of trident so not entirely sure what introduced the issue. Older installations of trident aren’t showing the issue but they may have already updated the ns as desired, not sure.

time="2023-05-04T16:58:51Z" level=error msg="error syncing 'trident': reconcile failed; error re-installing Trident 'trident' ; err: reconcile failed; failed to patch Trident installation namespace trident; admission webhook \"rancher.cattle.io.namespaces\" denied the request: Unauthorized, requeuing"