rancher: [BUG] Rancher 2.7.5 Active Directory authentication fails completely in Rancher 2.7.5 due to wrong objectGUID escaping

Rancher Server Setup

Rancher version: 2.7.5
Installation option (Helm Chart): K8S 1.24.x RKE1
Proxy/Cert Details:

Information about the Cluster

Kubernetes version: 1.24.x
Cluster Type (Local/Downstream): Local

User Information

What is the role of the user logged in? (Admin/Cluster Owner/Cluster Member/Project Owner/Project Member/Custom)
- All users are affected!

Describe the bug After upgrading to Rancher 2.7.5 (from 2.7.4) LDAP based login using Active Directory integration fails for all users.

To Reproduce Upgrade to 2.7.5 and then login with a fresh browser.

Result Login failure. In Rachner log (GUID data removed): LDAP Result Code 201 Filter Compile Error: ldap: invalid characters for escape in filter: encoding/hex: invalid byte: U+004E N

Expected Result Login success

Additional context The following change has introduced to problem: https://github.com/rancher/rancher/commit/cdd90ea6c687d8c83cfd0289e51be32da5c21d14

I think the problem source is that guidString := html.EscapeString(fmt.Sprintf("%x", entry.GetRawAttributeValue(common.AttributeObjectGUID))) is used to build an URL from the objectGUID value. This might result in a string that is not fully hex encoded. All our objectGUIDs contain the “N” byte with hex value ‘4E’. The HTML escaping might not escape this byte and then when it is used later in LDAP the filter string will contain “…\N…” which is not a proper escaping and leads to the above noted error message. This is still an assumption, because I have not understood the relevant code fully yet, but I am pretty sure, that it points at least in the right direction.

About this issue

Original URL
State: closed
Created a year ago
Reactions: 7
Comments: 51 (7 by maintainers)

Most upvoted comments

This is bug is more fatal than I thought initially, because most users cannot login even after having rolled back to Rancher 2.7.4. It seems that a migration of the user identifier from DN to objectGUID had already been running in the background and now all migrated users cannot login anymore.

We really need very urgently a fix for this.

+11

FFock on Jul 4, 2023

After upgrading from 2.7.1 to 2.7.5. Logging in with existsing account create duplicated users due to change in activedirectory_user:// from DN to GUID. This is not what i expected from just a patch…

michalg91 on Jul 7, 2023

how can it be that this issue exists for 2 weeks and no hotfix has been released? my client is also affected by this.

philipp1992 on Jul 14, 2023

Very frustrating to have hit this bug having been forced to upgrade to 2.7.5 from 2.7.3 due to the OOM bug, in addition to having to hold off upgrading to K8S 1.24 due to the high CPU bug! Seems to be a very buggy trend here!

geoff-carr-bzy on Jul 10, 2023

@crobby please revert the change that caused this issue and release a fix asap

I think at least for all users, that have already tried to install 2.7.5 somewhere with several downstream clusters active, reverting back to 2.7.4 will not solve the issue created by the upgrade. I did that already using a full HA Rancher 2.7.4 pre-upgrade backup, but the issue now is in the API key handling/tokens of the downstream cluster users (scoped keys are no longer working for users affected by failed GUID migration - even after deletion and recreation of those users!).

So far, we have not found and are not aware of any workaround. So we need a forward fix that restores full AD functionality with 2.7.6+.

I agree, just reverting these settings won’t help any customer that has already upgraded Rancher to 2.7.5. The fix must contain a logic that reverts the already migrated user accounts to their original state. Or at least there should be a release without this change to benefit from the fixes with the cluster agent and cilium bugs there were included in the 2.7.5.

0Styless on Jul 14, 2023

Is this being actively investigated ? We were eagerly awaiting this release to fix the high memory consumption bug on Rancher pod and cattle-cluster-agents, as well as introducting k8s 1.25 support to our teams. We won’t deploy 2.7.5 on our production instance because of this issue. Still running 2.6.6 as it is the most stable release we experienced so far.

Turb0Fly on Jul 17, 2023

We are also affected. Fortunately, we upgraded out test environment first. We did postpone our production upgrade due to Kubernetes 1.24 high CPU bug with RKE and then the cattle cluster agent memory leak in 2.7.2+. Was hoping on 2.7.5, but this is a real show stopper.

Turb0Fly on Jul 12, 2023

We are also suffering this issue and can see for each user logging in a new account is created.

papanito on Jul 11, 2023

We also encounter this bug after upgrading from 2.7.4 to 2.7.5 We have a lot of users that have broken permissions or can’t log into Rancher at all.

I agree that this is a major bug! We also opened a critical issue in the SUSE customer center.

0Styless on Jul 4, 2023

What won’t help me?

My comment how we solved this for us.

nobbs on Aug 24, 2023

@crobby could you please make a statement

philipp1992 on Jul 17, 2023

@crobby please revert the change that caused this issue and release a fix asap

I think at least for all users, that have already tried to install 2.7.5 somewhere with several downstream clusters active, reverting back to 2.7.4 will not solve the issue created by the upgrade. I did that already using a full HA Rancher 2.7.4 pre-upgrade backup, but the issue now is in the API key handling/tokens of the downstream cluster users (scoped keys are no longer working for users affected by failed GUID migration - even after deletion and recreation of those users!).

So far, we have not found and are not aware of any workaround. So we need a forward fix that restores full AD functionality with 2.7.6+.

FFock on Jul 14, 2023

@crobby please revert the change that caused this issue and release a fix asap

philipp1992 on Jul 14, 2023