rancher: Intermittent user auth errors relating to "impersonation" ClusterRoles
Rancher Server Setup
- Rancher version: v2.6.2
- Installation option (Docker install/Helm Chart): Helm chart on EKS cluster
- Proxy/Cert Details: Using nginx ingress-controller behind AWS NLB, with the nginx ingress handling TLS
Information about the Cluster
- Kubernetes version: v1.19.13
- Cluster Type (Local/Downstream): Imported (cluster is an EKS cluster)
User Information
- What is the role of the user logged in? Admin.
Describe the bug
A couple users have noticed intermittent authentication errors using the Kubernetes API. It is especially prevalent when browsing the rancher-monitoring Grafana due to number of API requests but occurs in the terminal as well.
The error message is:
Templating
Template variable service failed userextras.authentication.k9s.io "redacted@redacted.com" is forbidden: User "system:serviceaccount:cattle-impersonation-system:cattle-impersonation-u-redacted" cannot impersonate resource "userextras/username" in API group "authentication.k8s.io" at the cluster scope.
{
"kind": "Status",
"apiVersion": "v1",
"metadata": {},
"status": "Failure",
"message": "userextras.authentication.k8s.io \"okta_user://redacted@redacted.com\" is forbidden: User \"system:serviceaccount:cattle-impersonation-system:cattle-impersonation-u-redacted\" cannot impersonate resource \"userextras/principalid\" in API group \"authentication.k8s.io\" at the cluster scope",
"reason": "Forbidden",
"details": {
"name": "okta_user://redacted@redacted.com",
"group": "authentication.k8s.io",
"kind": "userextras"
},
"code": 403
}
To Reproduce
Set up Rancher using the Okta SAML2 auth provider using Rancher v2.6.2. Some users will experience this issue.
Additional context
We spent time investigating and discovered that the cattle-impersonation-system:cattle-impersonation-u-redacted
ClusterRole mentioned in the error message had some empty resourceNames
in the rules. The ClusterRole looked like:
...
- apiGroups:
- authentication.k8s.io
resourceNames:
- ""
resources:
- userextras/principalid
verbs:
- impersonate
- apiGroups:
- authentication.k8s.io
resourceNames:
- ""
resources:
- userextras/username
verbs:
- impersonate
To mitigate the issue we tried deleting the impacted ClusterRole. When the user made another Kubernetes API call, the ClusterRole was recreated with correct data:
...
- apiGroups:
- authentication.k8s.io
resourceNames:
- okta_user://redacted@redacted.com
resources:
- userextras/principalid
verbs:
- impersonate
- apiGroups:
- authentication.k8s.io
resourceNames:
- redacted@redacted.com
resources:
- userextras/username
verbs:
- impersonate
We then scanned the ClusterRoles for others that presented the same issue. There was one other ClusterRole which is continuously flipping between having the buggy "resourceNames":[""]
rule value and having the values filled in, for the userextras/username
and userextras/principalid
rules. We can see the ClusterRole UID is not changing, but its resourceVersion is incrementing about once every 3 seconds. This suggests that Rancher or some agent is periodically updating the ClusterRole with invalid information.
This may not be relevant, but the affected cluster is running a cattle-cluster-agent
deployment with two pods, using image rancher/rancher-agent:v2.6.2
. One pod is 60 days old, and its logs are quiet. The second pod is 5 days old and is downloading Helm indices every 5 minutes. No error logs though.
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Reactions: 2
- Comments: 39 (13 by maintainers)
Upgraded from 2.6.4 to 2.6.8 and still found this issue. The administrator has to go and do “Refresh Group memberships” , which solves the issue.
Is this a regression bug? or do we need to change any settings/config?
Release note template
Synopsis of issue: User attributes were not being refreshed correctly and sometimes became empty, which caused impersonation rules to be incorrect. Resolved (or not): resolved Versions affected: v2.6.7, v2.6.8
We moved as well from
v2.6.4
tov2.6.8
as @suryatejaboorlu ending up having two local users (used for helm within terraform) that were getting 403 to every call, unfortunately the “Refresh Group memberships” trick didn’t work for us. The fix was to log-in and then log-out into the rancher console with those local users and the issues dissapeared.