kubelogin: Azure CLI Login Method Does Not Work Properly with AAD Enabled AKS Clusters and Multiple Deployment Service Principals
Azure CLI Login Method Does Not Work Properly with AAD Enabled AKS Clusters and Multiple Deployment Service Principals
Overview
We are deploying applications to Azure Active Directory (AAD) enabled clusters and are required to use kubelogin. Since these are automated deployment pipelines we cannot have interactive logins. We deploy using Azure service principals so we need to use the Azure CLI login method in kubelogin.
The Azure CLI login method does not work properly with AAD enabled AKS clusters when more than one service principal is used by Azure DevOps deployment pipelines on the same build agent. Each pipeline is producing the same token cache file name even though they are for different service principals. This results in helm and kubectl commands failing due to authorization errors.
This problem became an issue when some deployment pipelines were using kubelogin v0.0.14 and others were using v0.0.10.
Scenario
- There are two different Azure Active Directory (AAD) enabled AKS clusters.
- Each AAD enabled cluster has its own service principal to deploy applications to the cluster. This is required since the clusters are for different applications and permissions must be assigned appropriately.
- For Azure DevOps deployment pipelines, the cluster configuration file is retrieved for the cluster using the
az aks get-credentials
Azure CLI command. The configuration needs to be converted to use Azure CLI logins (non-interactive) as documented here. - An Azure DevOps deployment pipeline is executed for cluster 1 on a given build agent. This is running using its deployment service principal (SP) SP 1. This results in a token cache file in ~/.kube/cache/kubelogin with a file name like: AzurePublicCloud-6dae42f8-4368-4678-94ff-3960e28e3630–.json. NOTE that the last two of the tokens in the name are BLANK.
- An Azure DevOps deployment pipeline is executed for cluster 2 on the same build agent as the first pipeline soon after the first pipeline completes. This is running using its deployment service principal (SP) SP 2. This results in a token cache file in ~/.kube/cache/kubelogin with a the same file name as the name generated for the first pipeline.
This causes pipeline 2 to fail with authorization errors since the access token in the cache file is for the wrong service principal.
Source and cluster config info
Source code to generate token file
Cache token file name is generated here:
cacheFile := getCacheFileName(o.Environment, o.ServerID, o.ClientID, o.TenantID, o.IsLegacy)
And you see the name is formed as follows:
func getCacheFileName(environment, serverID, clientID, tenantID string, legacy bool) string {
// format: ${environment}-${server-id}-${client-id}-${tenant-id}[_legacy].json
cacheFileNameFormat := "%s-%s-%s-%s.json"
if legacy {
cacheFileNameFormat = "%s-%s-%s-%s_legacy.json"
}
return fmt.Sprintf(cacheFileNameFormat, environment, serverID, clientID, tenantID)
}
The last two tokens in the generated file name are blank, so the ClientID and TenantID must not be set. The user information in the original cluster configuration is:
users:
- name: clusterUser_RG_KUB11614NP01_AZAMERES01KUB11614NP01
user:
auth-provider:
config:
apiserver-id: 6dae42f8-4368-4678-94ff-3960e28e3630
client-id: 80faf920-1908-4b52-b5ef-a8e7bedfc67a
config-mode: '1'
environment: AzurePublicCloud
tenant-id: ca56a4a5-e300-406a-98ff-7e36a0baac5b
name: azure
And after kubelogin convert-kubeconfig -l azurecli it is (Azure DevOps is masking the server-id value using ***):
- name: clusterUser_RG_KUB11614NP01_AZAMERES01KUB11614NP01
user:
exec:
apiVersion: client.authentication.k8s.io/v1beta1
args:
- get-token
- --server-id
- ***
- --login
- azurecli
command: kubelogin
env: null
provideClusterInfo: false
This would explain how clientID and tenantID are blank in the generated token cache file name: neither the clientId or tenantID are passed to kubelogin as arguments.
Changes Between kubelogin v0.0.10 and v0.0.14
It doesn’t look like the code that generates the token cache file name has changed between v.0.0.10 and v.0.0.14. So I’m not sure how this ever worked for us. Perhaps we were lucky and builds ran on different build agents and the name collision wasn’t an issue. More likely the token expired and a new one was created (and cache file updated) so the issue was hidden.
However, the contents on the file DID change.
In v0.0.10, the resource property in the token file is blank:
"resource":""
While in v.0.14, it has a value:
"resource":"6dae42f8-4368-4678-94ff-3960e28e3630"
It looks like in v0.0.14 it is putting the object id of the service principal in the resource property. That won’t work if two different pipelines using different service principals use the same kubelogin token cache file!
We need to update to the latest version of kubelogin especially since Kubernetes 1.24 requires the use of kubelogin with AAD enabled clusters. The current version of kubelogin (v0.0.14) is also required since it handles the new cluster configuration file that is generated starting in Kubernetes 1.24. We cannot stay on older versions of kubelogin (nor should we so that we keep up with bug fixes and security fixes).
Possible Workaround?
The kubelogin docs do not mention this (but kubelogin convert-kubeconfig --help
does), but the kubelogin executable allows you to specify --client-id <client ID>
. In my test YAML pipeline, the convert-kubeconfig -l azurecli
is running in an Azure CLI task. I made the following changes:
- Added
addSpnToEnvironment: true
to the task properties. - Changed the command to:
kubelogin convert-kubeconfig -l azurecli --client-id $env:servicePrincipalId
This results in the following in the converted cluster config file (Azure DevOps masks the service principal id in log output. The *** is the service principal id):
users:
- name: clusterUser_RG_KUB11614NP01_AZAMERES01KUB11614NP01
user:
exec:
apiVersion: client.authentication.k8s.io/v1beta1
args:
- get-token
- --server-id
- 6dae42f8-4368-4678-94ff-3960e28e3630
- --client-id
- ***
- --login
- azurecli
command: kubelogin
env: null
provideClusterInfo: false
And I get file names like AzurePublicCloud-6dae42f8-4368-4678-94ff-3960e28e3630-***-.json
*** is the service principal id. Azure DevOps masks the service principal id in log output.
So the clientID value is set in the file name.
Kubelogin also supports -t <tenant ID>
but that did not get written to the converted cluster config file.
Is this an appropriate solution to this issue?
kubelogin v.0.0.14 Does not Honor --client-id Flag with Kubernetes 1.24 Cluster Config Files
With Kubernetes 1.24, kubelogin v0.0.14 is also broken. The cluster config file you get is:
users:
- name: clusterUser_RG_KUB11614NP01_AZAMERES01KUB11614NP01
user:
exec:
apiVersion: client.authentication.k8s.io/v1beta1
args:
- get-token
- --environment
- AzurePublicCloud
- --server-id
- 6dae42f8-4368-4678-94ff-3960e28e3630
- --client-id
- 80faf920-1908-4b52-b5ef-a8e7bedfc67a
- --tenant-id
- ca56a4a5-e300-406a-98ff-7e36a0baac5b
- --login
- devicecode
command: kubelogin
env: null
and if you convert to to use Azure CLI login like our workaround before:
kubelogin convert-kubeconfig -l azurecli --client-id $env:servicePrincipalId
The converted file no longer contains the client-id argument:
- name: clusterUser_RG_KUB11614NP01_AZAMERES01KUB11614NP01
user:
exec:
apiVersion: client.authentication.k8s.io/v1beta1
args:
- get-token
- --server-id
- 6dae42f8-4368-4678-94ff-3960e28e3630
- --login
- azurecli
command: kubelogin
env: null
provideClusterInfo: false
AND YOU GET DUPLICATE FILE NAMES AGAIN!
Conclusion
The kubelogin code needs to generate unique file names for the token cache file name when using the Azure CLI login method. It cannot assume that the same service principal is being used.
Kevin Kizer Lead Engineer MetLife
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 19 (9 by maintainers)
Commits related to this issue
- added --token-cache-dir support in convert-kubeconfig (#105) * added --token-cache-dir in convert-kubeconfig sub-command. Addresses #104 — committed to Azure/kubelogin by weinong 2 years ago
The fix does exactly what we need.
In my Azure DevOps pipeline I get the cluster configuration using az aks-getcredentials and convert it using: kubelkogin convert-kubeconfig -l azurecli --token-cache-dir $(Agent.TempDirectory)
and the cluster configuration file contains:
It has the token-cache-dir argument!
We still need to test the change with multiple builds on the same build machine and each build uses a different service principal. I don’t expect any issues since each build would have its own agent temp directory.