kubelogin: The kubectl commands become significant slower when using kubelogin

In AKS, we started to get warnings such as below: image

So we adopted kubelogin by running the below commands: kubelogin convert-kubeconfig -l azurecli, which is non-interactively login to AKS. And then every time we run a kubectl command, we can feel a delayed response time. The issue is gone if do the normal way of refreshing AKS token via az aks get-credentials.

The kubeconfig file looks something like this when the issue is present:

- name: clusterUser_clustername
  user:
    exec:
      apiVersion: client.authentication.k8s.io/v1beta1
      args:
      - get-token
      - --server-id
      - XXXXX
      - --login
      - azurecli
      command: kubelogin
      env: null
      provideClusterInfo: false

This has been reported by many of our developers, but everybody uses the same AKS cluster, we will also initiate an Azure support ticket and see if it has something to do on cluster level, and also try in different clusters. But feel this is more related to kubelogin.

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 46 (20 by maintainers)

Most upvoted comments

@weinong: Not that specific command, no - it’s anything. Actually it appears to be klog, not Cobra. In its initialization code it’s calling user.Current(), which ends up hitting that endpoint on Windows. Looks like that’s changed in v2 of that library. I’ll create a PR.

That seems to have solved my presentation;

PS > while ($true) { (Measure-Command { .\kubelogin.exe --version }).TotalSeconds ; Start-Sleep -Seconds 10 }
0.0415748
0.0390878
0.0352182
0.0364319
0.0398296
0.0449192

And since I first noticed it from kubectl:

PS > while ($true) { (Measure-Command { k get namespace }).TotalSeconds ; Start-Sleep -Seconds 10 }
1.4390334
0.2734661
0.3758186
0.2577303
0.2748371
0.2721687
0.2929428

Clearly a token/auth of some sort at the start, and then as expected.

Great work, everyone!

Thanks @RagingTonberry - That helped me realize I can reproduce it here. I get a smaller (just under 3s) delay every 30s. I had a look using Process Monitor and I get a “bad network path” event trying to connect to <mydomain>\PIPE\samr. This happens every time I run it, even when it’s not slow, so Windows appears to be caching the failure result for 30s.

This network path is apparently the Security Account Manager (SAM) endpoint for RPC over SMB. I don’t really understand anything I just wrote, but it’s apparently used for enumerating domain user information.

There are a couple of similar issues in Docker for Windows (#1936 and #2131), and a nasty workaround involving adding a hosts file entry for the domain name to 127.0.0.1 (which worked for me), but I still don’t know why it would be trying to reach that endpoint.

@weinong - is there anything in the kubelogin code that might be calling a SAM library to retrieve user info, even when just running kubelogin --version?

For me, I don’t see any issues with az which i used with some frequency.

PS> Measure-Command { az account get-access-token -o json --resource 6dae42f8-4368-4678-94ff-3960e28e3630 }

*snip*
Seconds           : 2
Milliseconds      : 176
*snip*

But I still see periodic delays with kubelogin as I mentioned above:

PS> Measure-Command { kubelogin -v }
Error: flag needs an argument: 'v' in -v
*snip*
Seconds           : 7
Milliseconds      : 373
*snip*

PS> Measure-Command { kubelogin -v }
Error: flag needs an argument: 'v' in -v
*snip*
Seconds           : 0
Milliseconds      : 74
*snip*

Given that I didn’t even give it a valid command line still seems to me like it’s -something- in kubelogin land. I would imagine (having not looked at the code) that supplying an invalid commandline should lead to it dropping straight out and not doing any useful processing, like relying on a calls to the az cli, or azure itself.

Edit: I see the same behavior when I supply --version. A 7 second delay roughly every 30s. Still really feels like smart screen or something like that to me personally. Might be the case in my instance.

kubelogin does different things based on login mode. like @peterbom said, when using -l azurecli it invokes az cli. So when reporting slowness, please be specific about the login mode used

Thank you @kpkool for following up.

We are using the latest kubelogin version and our team is still experiencing significant slowness on Windows. Plus, we have upgraded our AKS to 1.24.X from 1.22.X

There is a kubelogin folder under .kube\cache, some people don’t have a kubelogin folder, and some people do have kubelogin folder with a AzurePublicCloud token.

We run a couple of kubelogin commands and capture the time elapsed, it varies between 35-37 secs for windows users, but only 3-4 secs for WSL user.

Thank you for the finding and fix! It has resolved the slowness seen in my tests too. Did a comparison between the latest kubelogin release containing this fix and previous kubelogin without the fix, the result looks very promising and steady, (note it is just a simple script to calculate the timespan of multiple kubectl commands)

<html> <body>
kubelogin version, Platform: Windows/amd64 time spent in sec time spent in sec time spent  in sec time spent  in sec time spent  in sec time spent in sec time spent in sec
after GitHub fix(kubelogin version  v0.0.31) 14 13 12 13 14 13 13
before GitHub fix(kubelogin version  v0.0.30) 37 35 13 16 36 13 36
</body> </html>

That seems to have solved my presentation;

PS > while ($true) { (Measure-Command { .\kubelogin.exe --version }).TotalSeconds ; Start-Sleep -Seconds 10 }
0.0415748
0.0390878
0.0352182
0.0364319
0.0398296
0.0449192

And since I first noticed it from kubectl:

PS > while ($true) { (Measure-Command { k get namespace }).TotalSeconds ; Start-Sleep -Seconds 10 }
1.4390334
0.2734661
0.3758186
0.2577303
0.2748371
0.2721687
0.2929428

Clearly a token/auth of some sort at the start, and then as expected.

Great work, everyone!

Beautiful and thank you so much for double checking!! Huge congrats to all who contributed and super awesomeness of everyone in this thread and @peterbom ❤️🎉 and @weinong ❤️🎉☕️

Shall we close this now.

excellent find!

either way, kubelogin doesn’t do anything besides providing the version string in the root command. So I guess it’s coming from Cobra/spf13?

@peterbom do you mean kubelogin --version in particular, or with any flag like kubelogin --foobar?

That might go some way to explaining my presentation then, since I’m a remote worker with a domain joined machine and I’m rarely VPN’d (and even then there’s some pretty decent delays because yay SMB).

From this (and from testing on my machine), it looks as if .kube\cache is not used for caching tokens when using the azurecli login type.

Instead, it calls az account get-access-token -o json --resource 6dae42f8-4368-4678-94ff-3960e28e3630. If you run this command repeatedly, is it slow? It’s not for me, but I’m using a newer az version (2.48.1).

The az CLI has its own caching (for me this seems to be .azure\msal_token_cache.bin), so it shouldn’t be needing to do the whole authentication dance each time.