kubernetes: Kubelet makes credentials discovery even if --cloud-provider flag is set to empty string, which slows down start process

What happened:

While starting kubelet configured to use Docker as container runtime, it tries to discover Docker credentials for all in-tree cloud providers, which blocks kubelet from starting fast, taking ~20 seconds.

What you expected to happen:

kubelet should not try to discover Cloud provider credentials when --cloud-provider="" flag is set.

How to reproduce it (as minimally and precisely as possible):

Just start kubelet in non-cloud provider environment. It should hang on log message clientconn.go:577] ClientConn switching balancer to "pick_first" for 15-20 seconds, then aws_credentials.go:77] while getting AWS credentials NoCredentialProviders: no valid providers in chain. Deprecated. should be printed and starting should proceed.

Anything else we need to know?:

Regular startup time:

I0211 22:37:49.810902   18074 server.go:416] Version: v1.17.2ca
I0211 22:38:10.769057   18074 reconciler.go:156] Reconciler: start to sync state

Startup time with patch posted below:

I0211 21:55:53.073330   31171 server.go:416] Version: v1.17.2-dirty
I0211 21:56:00.190537   31171 reconciler.go:156] Reconciler: start to sync state
diff --git a/pkg/credentialprovider/plugins.go b/pkg/credentialprovider/plugins.go
index 7256a5a1d05..f837089e869 100644
--- a/pkg/credentialprovider/plugins.go
+++ b/pkg/credentialprovider/plugins.go
@@ -60,6 +60,7 @@ func NewDockerKeyring() DockerKeyring {
 
        for _, key := range stringKeys {
                provider := providers[key]
+               continue
                if provider.Enabled() {
                        klog.V(4).Infof("Registering credential provider: %v", key)
                        keyring.Providers = append(keyring.Providers, provider)

Running provider.Enabled() on AWS plugin takes ~7 seconds.

I’ve also seen https://github.com/kubernetes/cloud-provider/issues/13, but I’m not sure how related is that to this issue. I would expect to have a flag or config option, which disables such lookup of credentials.

I’m guessing that slow kubelet start is not desired from autoscaling perspective and that in certain environments, startup can actually take more time (e.g. slow network?).

Environment:

  • Kubernetes version (use kubectl version):
    Client Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.2", GitCommit:"59603c6e503c87169aea6106f57b9f242f64df89", GitTreeState:"clean", 
    BuildDate:"2020-01-18T23:30:10Z", GoVersion:"go1.13.5", Compiler:"gc", Platform:"linux/amd64"}
    Server Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.2", GitCommit:"59603c6e503c87169aea6106f57b9f242f64df89", GitTreeState:"clean", 
    BuildDate:"2020-01-18T23:22:30Z", GoVersion:"go1.13.5", Compiler:"gc", Platform:"linux/amd64"}
    
  • Cloud provider or hardware configuration: KVM libvirt
  • OS (e.g: cat /etc/os-release):
    NAME="Flatcar Container Linux by Kinvolk"
    ID=flatcar
    ID_LIKE=coreos
    VERSION=2303.4.0
    VERSION_ID=2303.4.0
    BUILD_ID=2020-02-08-0855
    PRETTY_NAME="Flatcar Container Linux by Kinvolk 2303.4.0 (Rhyolite)"
    ANSI_COLOR="38;5;75"
    HOME_URL="https://flatcar-linux.org/"
    BUG_REPORT_URL="https://issues.flatcar-linux.org"
    FLATCAR_BOARD="amd64-usr"
    
  • Kernel (e.g. uname -a): Linux controller01 4.19.95-flatcar #1 SMP Sat Feb 8 07:25:12 -00 2020 x86_64 QEMU Virtual CPU version 2.5+ GenuineIntel GNU/Linux
  • Install tools: libflexkube
  • Network plugin and version (if this is a network-related bug):
  • Others:
Containers: 36
 Running: 24
 Paused: 0
 Stopped: 12
Images: 12
Server Version: 18.06.3-ce
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 468a545b9edcd5932818eb9de8e72413e616e86e
runc version: a592beb5bc4c4092b1b1bac971afed27687340c5
init version: fec3683b971d9c3ef73f284f176672c44b448662
Security Options:
 seccomp
  Profile: default
 selinux
Kernel Version: 4.19.95-flatcar
Operating System: Flatcar Container Linux by Kinvolk 2303.4.0 (Rhyolite)
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 1.946GiB
Name: controller01
ID: TWNE:V2YU:G5IU:NHEB:HWUA:AZXO:7FQP:JF6X:QPNM:HX6D:5YOP:R5XR
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 25 (19 by maintainers)

Most upvoted comments

I think the issue here isn’t within the loop, but the fact that the AWS provider returns true for Enabled blindly https://github.com/kubernetes/kubernetes/blob/392de8012eb4116a0467f8fbe771ba5a17e34a86/pkg/credentialprovider/aws/aws_credentials.go#L76-L79

For the Azure and GCP implementations, they actually try to detect whether they are on the platform and then disable themselves when they think they aren’t.

Perhaps some check could be added within the AWS Enabled to match the behaviour of the other two platforms

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@matthyx I doubt I find time to submit a proper PR for this, but I’ll try to have a look soon-ish and let’s see how that goes.