kubelogin: The kubectl commands become significant slower when using kubelogin
In AKS, we started to get warnings such as below:
So we adopted kubelogin by running the below commands: kubelogin convert-kubeconfig -l azurecli
, which is non-interactively login to AKS. And then every time we run a kubectl command, we can feel a delayed response time. The issue is gone if do the normal way of refreshing AKS token via az aks get-credentials
.
The kubeconfig file looks something like this when the issue is present:
- name: clusterUser_clustername
user:
exec:
apiVersion: client.authentication.k8s.io/v1beta1
args:
- get-token
- --server-id
- XXXXX
- --login
- azurecli
command: kubelogin
env: null
provideClusterInfo: false
This has been reported by many of our developers, but everybody uses the same AKS cluster, we will also initiate an Azure support ticket and see if it has something to do on cluster level, and also try in different clusters. But feel this is more related to kubelogin.
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 46 (20 by maintainers)
@weinong: Not that specific command, no - it’s anything. Actually it appears to be
klog
, not Cobra. In its initialization code it’s callinguser.Current()
, which ends up hitting that endpoint on Windows. Looks like that’s changed in v2 of that library. I’ll create a PR.That seems to have solved my presentation;
And since I first noticed it from
kubectl
:Clearly a token/auth of some sort at the start, and then as expected.
Great work, everyone!
https://github.com/Azure/kubelogin/releases/tag/v0.0.31 is out. please give it a try!
Thanks @RagingTonberry - That helped me realize I can reproduce it here. I get a smaller (just under 3s) delay every 30s. I had a look using Process Monitor and I get a “bad network path” event trying to connect to
<mydomain>\PIPE\samr
. This happens every time I run it, even when it’s not slow, so Windows appears to be caching the failure result for 30s.This network path is apparently the Security Account Manager (SAM) endpoint for RPC over SMB. I don’t really understand anything I just wrote, but it’s apparently used for enumerating domain user information.
There are a couple of similar issues in Docker for Windows (#1936 and #2131), and a nasty workaround involving adding a hosts file entry for the domain name to 127.0.0.1 (which worked for me), but I still don’t know why it would be trying to reach that endpoint.
@weinong - is there anything in the kubelogin code that might be calling a SAM library to retrieve user info, even when just running
kubelogin --version
?For me, I don’t see any issues with
az
which i used with some frequency.But I still see periodic delays with
kubelogin
as I mentioned above:Given that I didn’t even give it a valid command line still seems to me like it’s -something- in kubelogin land. I would imagine (having not looked at the code) that supplying an invalid commandline should lead to it dropping straight out and not doing any useful processing, like relying on a calls to the az cli, or azure itself.
Edit: I see the same behavior when I supply
--version
. A 7 second delay roughly every 30s. Still really feels like smart screen or something like that to me personally. Might be the case in my instance.kubelogin does different things based on login mode. like @peterbom said, when using
-l azurecli
it invokes az cli. So when reporting slowness, please be specific about the login mode usedThank you @kpkool for following up.
We are using the latest kubelogin version and our team is still experiencing significant slowness on Windows. Plus, we have upgraded our AKS to 1.24.X from 1.22.X
There is a kubelogin folder under .kube\cache, some people don’t have a kubelogin folder, and some people do have kubelogin folder with a AzurePublicCloud token.
We run a couple of kubelogin commands and capture the time elapsed, it varies between 35-37 secs for windows users, but only 3-4 secs for WSL user.
Thank you for the finding and fix! It has resolved the slowness seen in my tests too. Did a comparison between the latest kubelogin release containing this fix and previous kubelogin without the fix, the result looks very promising and steady, (note it is just a simple script to calculate the timespan of multiple kubectl commands)
<html> <body>Beautiful and thank you so much for double checking!! Huge congrats to all who contributed and super awesomeness of everyone in this thread and @peterbom ❤️🎉 and @weinong ❤️🎉☕️
Shall we close this now.
excellent find!
either way, kubelogin doesn’t do anything besides providing the version string in the root command. So I guess it’s coming from Cobra/spf13?
@peterbom do you mean
kubelogin --version
in particular, or with any flag likekubelogin --foobar
?That might go some way to explaining my presentation then, since I’m a remote worker with a domain joined machine and I’m rarely VPN’d (and even then there’s some pretty decent delays because yay SMB).
From this (and from testing on my machine), it looks as if
.kube\cache
is not used for caching tokens when using theazurecli
login type.Instead, it calls
az account get-access-token -o json --resource 6dae42f8-4368-4678-94ff-3960e28e3630
. If you run this command repeatedly, is it slow? It’s not for me, but I’m using a neweraz
version (2.48.1).The
az
CLI has its own caching (for me this seems to be.azure\msal_token_cache.bin
), so it shouldn’t be needing to do the whole authentication dance each time.