rancher: Intermittent user auth errors relating to "impersonation" ClusterRoles

Rancher Server Setup

  • Rancher version: v2.6.2
  • Installation option (Docker install/Helm Chart): Helm chart on EKS cluster
  • Proxy/Cert Details: Using nginx ingress-controller behind AWS NLB, with the nginx ingress handling TLS

Information about the Cluster

  • Kubernetes version: v1.19.13
  • Cluster Type (Local/Downstream): Imported (cluster is an EKS cluster)

User Information

  • What is the role of the user logged in? Admin.

Describe the bug

A couple users have noticed intermittent authentication errors using the Kubernetes API. It is especially prevalent when browsing the rancher-monitoring Grafana due to number of API requests but occurs in the terminal as well.

The error message is:

Templating
Template variable service failed userextras.authentication.k9s.io "redacted@redacted.com" is forbidden: User "system:serviceaccount:cattle-impersonation-system:cattle-impersonation-u-redacted" cannot impersonate resource "userextras/username" in API group "authentication.k8s.io" at the cluster scope.
{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {},
  "status": "Failure",
  "message": "userextras.authentication.k8s.io \"okta_user://redacted@redacted.com\" is forbidden: User \"system:serviceaccount:cattle-impersonation-system:cattle-impersonation-u-redacted\" cannot impersonate resource \"userextras/principalid\" in API group \"authentication.k8s.io\" at the cluster scope",
  "reason": "Forbidden",
  "details": {
    "name": "okta_user://redacted@redacted.com",
    "group": "authentication.k8s.io",
    "kind": "userextras"
  },
  "code": 403
}

To Reproduce

Set up Rancher using the Okta SAML2 auth provider using Rancher v2.6.2. Some users will experience this issue.

Additional context

We spent time investigating and discovered that the cattle-impersonation-system:cattle-impersonation-u-redacted ClusterRole mentioned in the error message had some empty resourceNames in the rules. The ClusterRole looked like:

...
- apiGroups:
  - authentication.k8s.io
  resourceNames:
  - ""
  resources:
  - userextras/principalid
  verbs:
  - impersonate
- apiGroups:
  - authentication.k8s.io
  resourceNames:
  - ""
  resources:
  - userextras/username
  verbs:
  - impersonate

To mitigate the issue we tried deleting the impacted ClusterRole. When the user made another Kubernetes API call, the ClusterRole was recreated with correct data:

...
- apiGroups:
  - authentication.k8s.io
  resourceNames:
  - okta_user://redacted@redacted.com
  resources:
  - userextras/principalid
  verbs:
  - impersonate
- apiGroups:
  - authentication.k8s.io
  resourceNames:
  - redacted@redacted.com
  resources:
  - userextras/username
  verbs:
  - impersonate

We then scanned the ClusterRoles for others that presented the same issue. There was one other ClusterRole which is continuously flipping between having the buggy "resourceNames":[""] rule value and having the values filled in, for the userextras/username and userextras/principalid rules. We can see the ClusterRole UID is not changing, but its resourceVersion is incrementing about once every 3 seconds. This suggests that Rancher or some agent is periodically updating the ClusterRole with invalid information.

This may not be relevant, but the affected cluster is running a cattle-cluster-agent deployment with two pods, using image rancher/rancher-agent:v2.6.2. One pod is 60 days old, and its logs are quiet. The second pod is 5 days old and is downloading Helm indices every 5 minutes. No error logs though.

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 2
  • Comments: 39 (13 by maintainers)

Most upvoted comments

Upgraded from 2.6.4 to 2.6.8 and still found this issue. The administrator has to go and do “Refresh Group memberships” , which solves the issue.

Is this a regression bug? or do we need to change any settings/config?

Release note template

Synopsis of issue: User attributes were not being refreshed correctly and sometimes became empty, which caused impersonation rules to be incorrect. Resolved (or not): resolved Versions affected: v2.6.7, v2.6.8

We moved as well from v2.6.4 to v2.6.8 as @suryatejaboorlu ending up having two local users (used for helm within terraform) that were getting 403 to every call, unfortunately the “Refresh Group memberships” trick didn’t work for us. The fix was to log-in and then log-out into the rancher console with those local users and the issues dissapeared.