rancher: Upgrade to 2.6.5 breaks default admin cluster role , with okta integration

Rancher Server Setup

  • Rancher version: v2.6.4 -> (upgrade to ) v2.6.5
  • Installation option (Docker install/Helm Chart):
    • Using terraform rke provider version v1.3.0
  • provider registry.terraform.io/hashicorp/external v2.2.2
  • provider registry.terraform.io/hashicorp/helm v2.2.0
  • provider registry.terraform.io/hashicorp/kubernetes v2.11.0
  • provider registry.terraform.io/hashicorp/local v2.2.3
  • provider registry.terraform.io/hashicorp/null v3.1.1
  • provider registry.terraform.io/rancher/rancher2 v1.22.2
  • provider registry.terraform.io/rancher/rke v1.3.0

Information about the Cluster

  • Kubernetes version: 1.22.4
  • Cluster Type (Local/Downstream): Local and Downstream
    • If downstream, what type of cluster? (Custom/Imported or specify provider for Hosted/Infrastructure Provider):

User Information

  • What is the role of the user logged in? (Admin)

Describe the bug

We recently upgraded our rancher from v2.6.4 to v2.6.5 . After the upgrade we are no longer able to run plan against our downstream cluster. Our cluster is integrated with okta authentication. The issue is similar to what is mentioned in https://github.com/rancher/rancher/issues/36096

To Reproduce

Upgrade your cluster from rancher v2.6.4 to v2.6.5 Result The Default Admin Cluster role should have following rules to update the downstream cluster. `kubectl get clusterrole cattle-impersonation-user-REDACTED -o yaml apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: creationTimestamp: “2022-02-07T16:50:18Z” labels: authz.cluster.cattle.io/impersonator: “true” cattle.io/creator: norman name: cattle-impersonation-user-REDACTED resourceVersion: “419324355” uid: REDACTED rules:

  • apiGroups:
    • “” resourceNames:
    • user-REDACTED resources:
    • users verbs:
    • impersonate
  • apiGroups:
    • “” resourceNames:
    • okta_group:/REDACTED
    • system:authenticated
    • system:cattle:authenticated resources:
    • groups verbs:
    • impersonate
  • apiGroups:
  • authentication.k8s.io resourceNames:
  • local://user-REDACTED resources:
  • userextras/principalid verbs:
  • impersonate
  • apiGroups:
  • authentication.k8s.io resourceNames:
  • Default Admin resources:
  • userextras/username verbs:
  • impersonate`

We get following errors when terraform plan is run Error: userextras.authentication.k8s.io “Default Admin” is forbidden: User “system:serviceaccount:cattle-impersonation-system:cattle-impersonation-user-REDACTED” cannot impersonate resource “userextras/username” in API group “authentication.k8s.io” at the cluster scope

Expected Result

But when you run a watch on the kubectl get clusterrole cattle-impersonation-user-REDACTED -o yaml --watch you would see that these rules are removed `it removes following rules from the cluster role

  • apiGroups:
  • authentication.k8s.io resourceNames:
  • local://user-wv5p2 resources:
  • userextras/principalid verbs:
  • impersonate
  • apiGroups:
  • authentication.k8s.io resourceNames:
  • Default Admin resources:
  • userextras/username verbs:
  • impersonate ===================================`

I had to roll back my upgrade to v2.6.4 to get the cluster back in working state. Also after upgrading we do have other errors in rancher logs . Rancher catalogs are refreshed every 5 minutes .
Rancher Internal Support Ticket is 00349589

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 4
  • Comments: 21 (7 by maintainers)

Most upvoted comments

I’ve reproduced the issue by starting rancher at version v2.6.2 and upgrading to v2.6.5. I can observe the problem with or without auth providers. Fix is up for review at https://github.com/rancher/rancher/pull/38193

✅ PASSED

Reproduction Environment

Component Version / Type
Rancher version 2.6.2
Installation option helm
RKE binary version used 1.3.1
If Helm Chart k8s cluster v1.21.5
Cert Details let’s encrypt
Helm version v2.16.8-rancher1
Downstream cluster type rke1 linode
Downstream K8s version v1.21.13-rancher1-1
Logged in user role administrator
Browser type google chrome
Browser version 103.0.5060.114

ADDITIONAL SETUP

  • Helm commands for Rancher installation:
helm install rancher rancher-latest/rancher \
  --namespace cattle-system \
  --set hostname=$URL_VAR \
  --set ingress.tls.source=letsEncrypt \
  --set letsEncrypt.email=$EMAIL_VAR \
  --set letsEncrypt.ingress.class=nginx \
  --version 2.6.2

Reproduction steps

  1. Create a downstream rke1 linode cluster
  2. Get the kubeconfig file for this downstream cluster
  3. Make sure that using the downloaded kubeconfig file works with something like kubectl get pods -A
  4. Upgrade rancher
helm upgrade rancher rancher-latest/rancher \
  --namespace cattle-system \
  --set hostname=$URL_VAR \
  --set ingress.tls.source=letsEncrypt \
  --set letsEncrypt.email=$EMAIL_VAR \
  --set letsEncrypt.ingress.class=nginx \
  --version 2.6.5
  1. Wait for Rancher to finish upgrading
  2. Using the same kubeconfig file from step #2 run kubectl get pods -A

RESPONSE

Error from server (Forbidden): userextras.authentication.k8s.io "local://user-<REDACTED_SHORT_UUID>" is forbidden: User "system:serviceaccount:cattle-impersonation-system:cattle-impersonation-user-<REDACTED_SHORT_UUID>" cannot impersonate resource "userextras/principalid" in API group "authentication.k8s.io" at the cluster scope

Additional Info

RESULTS

✅ Expected

Expected to be able to access the downstream cluster with the same kubeconfig file

❌ Actual

Was unable to access the downstream cluster after upgrade using the same kubeconfig file


Validation Environment

Component Version / Type
Rancher version 2.6.2
Installation option helm
RKE binary version used 1.3.1
If Helm Chart k8s cluster v1.21.5
Cert Details let’s encrypt
Helm version v2.16.8-rancher1
Downstream cluster type rke1 linode
Downstream K8s version v1.21.13-rancher1-1
Logged in user role administrator
Browser type google chrome
Browser version 103.0.5060.114

ADDITIONAL SETUP

  • Helm commands for Rancher installation:
helm install rancher rancher-latest/rancher \
  --namespace cattle-system \
  --set hostname=$URL_VAR \
  --set ingress.tls.source=letsEncrypt \
  --set letsEncrypt.email=$EMAIL_VAR \
  --set letsEncrypt.ingress.class=nginx \
  --version 2.6.2

Validation steps

  1. Create a downstream rke1 linode cluster
  2. Get the kubeconfig file for this downstream cluster
  3. Make sure that using the downloaded kubeconfig file works with something like kubectl get pods -A
  4. Upgrade rancher
helm upgrade rancher rancher-latest/rancher \
  --namespace cattle-system \
  --set hostname=$URL_VAR \
  --set ingress.tls.source=letsEncrypt \
  --set letsEncrypt.email=$EMAIL_VAR \
  --set letsEncrypt.ingress.class=nginx \
  --set rancherImageTag=v2.6-head
  1. Wait for Rancher to finish upgrading
  2. Using the same kubeconfig file from step #2 run kubectl get pods -A
  3. A list of pods has been successfully returned

Additional Info

RESULTS

✅ Expected

kubeconfig to access a downstream cluster to work after an upgrade

✅Actual

the kubeconfig worked for accessing the downstream cluster after the upgrade

OTHER AREAS CHECKED

Test Pass/Fail
Single Node :white_check_mark: PASS
Auth Provider :white_check_mark: PASS
Docker :white_check_mark: PASS

Sorry @cmurphy. I meant 2.6.5 and 2.6.4.

Hi all, could I get some clarification on the reproduction steps here:

  • Does this only happen when you use terraform to provision a downstream cluster? Does it happen for anyone who has provisioned a downstream cluster through the Rancher UI? Could I see an example terraform config?
  • I see different people saying the problem happens in different places. Is the forbidden error happening when you run terraform plan, using a token key in the terraform config? Is it happening when you download a kubeconfig for the cluster and use it with kubectl? Is it happening if you use the kubectl shell in the dashboard? Is it happening when you use Cluster Explorer?
  • Is this happening for anyone who isn’t using any auth provider, only using local users? If it is happening for you with an auth provider, which auth providers, other than okta?

For us, it’s happening with LDAP as an auth provider. The cluster was created from the Rancher UI.