kubernetes: CredentialProvider fails to read environment variable passed into the CredentialProviderConfig file

What happened:

I am attempting to pull an image from ECR using the new kubelet image credential provider feature which is an alpha features in Kubernetes v1.20. My CredentialProviderConfig file includes the environment variable AWS_PROFILE which resides in the default (~/.aws) directory. The profile references a credential_process which allows me to execute a binary that returns a set of temporary AWS credentials. When I try running a pod with a container from ECR, I see the following errors in the kubelet logs:

Jun 09 21:47:05 ecr-kubelet-worker kubelet[225990]: E0609 21:47:05.229849  225990 plugin.go:198] Failed getting credential from external registry credential provider: error execing credential provider plugin ecr-credential-provider for image 820537372947.dkr.ecr.us-east-1.amazonaws.com/cats: exit status 1
Jun 09 21:47:05 ecr-kubelet-worker kubelet[225990]: E0609 21:47:05.597552  225990 remote_image.go:113] PullImage "820537372947.dkr.ecr.us-east-1.amazonaws.com/cats:v2" from image service failed: rpc error: code = Unknown desc = failed to pull and unpack image "820537372947.dkr.ecr.us-east-1.amazonaws.com/cats:v2": failed to resolve reference "820537372947.dkr.ecr.us-east-1.amazonaws.com/cats:v2": unexpected status code [manifests v2]: 401 Unauthorized

However, when I type the following into the command line on the kubelet:

echo '{"kind": "CredentialProviderRequest", "apiVersion": "credentialprovider.kubelet.k8s.io/v1alpha1", "image": "820537372947.dkr.ecr.us-east-1.amazonaws.com/cats:v2"}' | AWS_PROFILE=default /usr/local/kubelet/bin/ecr-credential-provider get-credentials

I get the following response:

{"kind":"CredentialProviderResponse","apiVersion":"credentialprovider.kubelet.k8s.io/v1alpha1","cacheKeyType":"Registry","cacheDuration":"21.6µs","auth":{"820537372947.dkr.ecr.us-east-1.amazonaws.com":{"username":"AWS","password":"eyJwYXlsb2FkIjoiak1RV0pidXpzVnRDd28wNHExdmlScmlwaVRLd3BOc25hejFlUktxVDlBOWZES0FVTjlsbkozSGl0TGh4OHhnejViU0xrRncxNGtaU3htd2R6Rks3MHNCd2FRL0M3eWdtSlVueGNrZGI0eWFjZUhIQ1NSbVZyMjNHam9VOE1yTjJadDdNUnprK0c0RzVWdmYrMVhjODdKdHhFNEZOTWU1dytoSjFRZXYydTlaNXBCVExhTHdaRDNxT0t4dk1aTi9iK0E5Vys2ajI0L1kxUGxER2wybnpoSWxDdFV3aXBxTDhaTS9MdzFwT1RmYmY1eXp6OFIxVDd0MUZvbER3SWVKNUNzQkZBQWZJanNITmpES2tTL3dYdXVlMDBOZGZNeVo2UHlPMjlVanBLdGdqMU8vZTBwZ0J4SzRGelpRVldROHE5Wk1paWtFL3BxRXMvZzN0WXNJcDhTalR6RkZqTys4UVB4WVdFZ2ZrVDFRcjZnbmtrOXZLRkdOSGd6SEV3Z0JMNUJQeDFReXpTSGJyNmlPMDd2Sjh1QzE3eHpITEZwcmpuR3o3T3AyQ1U3SFYydjVwSjdlNFM1d0Qrenljbk5NZFVabjBsQ2dzdlBpUGtuYkV5Rk1lZ050Y3dmM3JKdnNkSm5FZ3IrVGowTTlsdktTWHJBL3hGVW9vZXFQRFhQY2t4SlN6eXM2dkYvZGdXbWJZcXJPTW1jdnBaZ3VpUVo3aTF0NnZncEVIcUQ1RzIzbyt2UnlzQ1FjOElTNVRHVk1peDhGdXp1UjkyT0djbnhIVm85U0gzdWhVL096clZKdWJQQTdFcGJHTjJCM2lFUTFsaHBGeXdvdGtyNit2dFhvU01GSFBFeXFKQ2syL0VWTmphWi9Vb05ZS3JrL3hZYmJPS29UQmlpNUFrTkZDcE9IU1BqM1JVTHpQcXhJVCtPM2tkZ3ZwZ0RMSTI1c3AvYVVUV0Fpd21KYm5BZTFTR1RrdTcyOUk1cVY0bkxVdWtzK2Y0OEJqUlNXUlI0YWlqZnIxbFpLczBLRnlFTk16WkJvbGdBYWZMV01NNnEwaXlMd1Nzc1pJK2xPYmgxU3JUa3IwRFpYRDR4N05QQlN4RTByU1FmSDdzR1M3Y3VLbG85cUUrVDBMSGhxbE0velU1eDU2aTN5MkhZYnV5RDRlSHVvd1F5dVh6UTFRSXRhZDF3Q0U1blJCWjlOU1o1VDE1R1ZheklDcVFCbmFIaFl4WnZqemZuVm1vU1JrL0d1LzFESFU1c2c5U3Z0SXBkL0hXeFkxNXBTQk9nUHNaK2c5WTdOSXAyYXRHNEg0Qmh0eU40T215bm5VaGFOQjZPZU0xV3l3VFY1dXh5cHhINXk3STJwYkE4MDVMR1U4cGNzMGNvTU5Kbmdpam5aRGhzVDRhMisrS3F5d0VPaVpkZ29JZ0ZPdnFwWnIycEVuZlo2MUlEN2pjbjgwR0tMUExvalEzVGV4bjhtVFNINjhzS1l5ZHlyZGJ0MDg0bXZUbjEvNE1PeW9yOUVYeU9qMzY4UzZ2Sno1YmQ4dSsxc1Fva2lBb3FkNkIyWndLYm1QSWx4T0h6bVl1a3N6UGxnWDA1NFNQVUNNSGJJTjJacncxTmJIb3V1UXU1WFVjeGZPR2diK0g1bnBIL1VvTzJFdmR2T29TOWNlczBjMkNDU2hKbHY4MVE4ZUhGRExGRS9tajJaVkF6aFN1RXhsbTcvazlHaVJ0bW9JS3VISFBzRUhDVTRsQTBWVk1HNG44SFpEY1A3bWNrTTRCTS9ibmZTVXNabUc0ZjFEK29WdmpGQS9oWTQ3R0JrQmpVNGJ3OFRoeXprOXFxL2MxbE9tWDc4QllwUDRuMmdKbGd5M0wzcnFTMFBMc2xkNHZzZUFFSWRML29XdjlqZ2FWYjM4bHJuaEdHeTFXc09hOVJDWFMvWTRrc25vRGtnaHVmd1hGcHQveWE0d1VMcUtRL1d3bTZyZ2R0TEdGb3dPbERPOHY1cnZZWDY4WVNOSzFrcEJTcXZoak5haG5ZQU1pNEFrOXVGVFVqaDRDQmFLVTg4OFVYdUFBNERNbktMWEk0a29HK3pGK2ZVYVljOW52Vmo2dDZVdzFaQjFKTm5vNG00b1ZIUlo4U1E5c2g0Z2U3bFd4b2FiZVZxWDE0N0k4T0orZDV1L2g1MHdmWkRQT3AzZXViKzJzR28wZEl3SE5jM1lxREpBd0szSmltdFozakVJYldodmJzWUZUZzlFMkk5OUJqQ2ZnSWx6c092eXlxMXV2RW9DZ0RMN3ZySWI4amFaQWtsSDJWelQ5RUNzU3REc2J2K3ZuaGg4K2VaK2hnRzVOQXV2ZTRWL2NMVjZOL01KbzVadU1vMEg0THZyTnVRVDNXVlIzZjBqYnZlVmMrT2tyZWQxdEFOczdCUGJFdUlCZE5VaURMSmNpWkFJb3J6aGkvaDZxU3EzUE9Jc1pSencvczVBOCt0NERNQlptYXRsbDFZekJYV2EvN3pxbGVpZWlod01XU2ZqWUpDOTRVa1l1c3hkbjZCclJWVzhKNDdscHNqUVlNbG96V01raWRPalpIWUJqc0hVRXpVSytFdTdyQjQ2eDF6U2RkYSs5a3hZY0tXS1UxOUtKWTVQUTVUd1dVRXZKbXJna3dwZjdjZWRhTzFtWUwvZEpMK2ZMOFA4M09jMDJxSXJ0MzFRQUZRNGRmOFNkcmxCQVp4MExCQVdtTU0vd0RPTTJMNXlFckI0cVNTRXEvbHo4Rnc5RWFFTll3UW93N2hDQzFIRXZJSzZUWTQ1YXBZZU55clZ6UzFUUTJjSkVRWi9ZUGFlUVNFZVE1UEVnRkxpQ3RmdEhnPT0iLCJkYXRha2V5IjoiQVFFQkFIaHdtMFlhSVNKZVJ0Sm01bjFHNnVxZWVrWHVvWFhQZTVVRmNlOVJxOC8xNHdBQUFINHdmQVlKS29aSWh2Y05BUWNHb0c4d2JRSUJBREJvQmdrcWhraUc5dzBCQndFd0hnWUpZSVpJQVdVREJBRXVNQkVFRExzbWJPRldpeXcxbkhEVEVnSUJFSUE3RFVyUmcwd1ZWRlB4OFpGbkdQdkNGTVUwM21zQjZNTVZESUg5TnJLejN3Uy84L0NFZFQydy9RbEo4S1piaG1YNmlwdTl6RmNZdEQvOGFhRT0iLCJ2ZXJzaW9uIjoiMiIsInR5cGUiOiJEQVRBX0tFWSIsImV4cGlyYXRpb24iOjE2MjMzMTg2MTl9"}}}

It doesn’t appear the environment variables in my Config are being passed to the Kubelet when it executes the ecr-credential-provider binary.

Below is Config file:

kind: CredentialProviderConfig
apiVersion: kubelet.config.k8s.io/v1alpha1
providers:
  - name: ecr-credential-provider
    matchImages:
    - "*.dkr.ecr.*.amazonaws.com"
    defaultCacheDuration: "12h"
    apiVersion: credentialprovider.kubelet.k8s.io/v1alpha1
    args:
     - get-credentials
    env:
    - name: "AWS_PROFILE"
      value: "default"

What you expected to happen:

I expect for the Kubelet to pass the environment variables in the Config file when it executes the ecr-credential-provider binary.

How to reproduce it (as minimally and precisely as possible):

I am using KinD to test this feature. Here is my config:

kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
runtimeConfig:
  "api/alpha": "true"
name: ecr-kubelet
nodes:
- role: control-plane
  kubeadmConfigPatches:
  - |
    kind: ClusterConfiguration
    apiServer: 
      extraArgs:
        feature-gates: 'KubeletCredentialProviders=true'
  image: kindest/node:v1.20.7@sha256:e645428988191fc824529fd0bb5c94244c12401cf5f5ea3bd875eb0a787f0fe9
- role: worker
  image: kindest/node:v1.20.7@sha256:e645428988191fc824529fd0bb5c94244c12401cf5f5ea3bd875eb0a787f0fe9
  extraMounts:
  - hostPath: /Users/jicowan/go/bin
    containerPath: /usr/local/kubelet/bin
  - hostPath: /Users/jicowan/conf
    containerPath: /usr/local/kubelet/config
  kubeadmConfigPatches:
  - |
    kind: JoinConfiguration
    nodeRegistration: 
      kubeletExtraArgs: 
        feature-gates: 'KubeletCredentialProviders=true'
        image-credential-provider-config: '/usr/local/kubelet/config/ecr.conf'
        image-credential-provider-bin-dir: '/usr/local/kubelet/bin/'

Anything else we need to know?:

No

Environment:

  • Kubernetes version (use kubectl version): 1.20
  • Cloud provider or hardware configuration: None
  • OS (e.g: cat /etc/os-release): Ubuntu 20.10
  • Kernel (e.g. uname -a): 5.10
  • Install tools: KinD
  • Network plugin and version (if this is a network-related bug):
  • Others:

cc @ayberk @andrewsykim

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 48 (28 by maintainers)

Most upvoted comments

@n4j : I am facing the similar issue on my bare metal cluster

This is my config

apiVersion: kubelet.config.k8s.io/v1beta1
kind: CredentialProviderConfig
providers:
  - name: ecr-credential-provider
    matchImages:
      - "*.dkr.ecr.*.amazonaws.com"
    defaultCacheDuration: "12h"
    apiVersion: credentialprovider.kubelet.k8s.io/v1beta1
    env:
      - name: AWS_PROFILE
        value: "default"
      - name: HOME
        value: "/root"
      - name: AWS_CONFIG_FILE
        value: "/root/.aws/config"     
      - name: PATH
        value: "$PATH:/usr/local/kubelet/bin" 

Can you help. It works when I try to execute the credentials binary from local but doesn’t work in the pod. Can you pls guide me

@n4j Thanks for looking into the issue, you are right the spec does says that. Feel free to fix it https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/apis/config/types.go#L550-#L554 /triage accepted

@n4j sure thing.

@jicowan After several rounds of troubleshooting I was able to get ecr-credential-provider working in my KinD environment.

My credential loading setup is ecr-credential-provider <-> cognito <-> AWS Cognito and below is my configuration of credential process

[profile cognito]
credential_process = /usr/bin/cognito -u n4j-REDACTED -p __4C1751_REDACTED_523be3993

Initially, I got the same error as yours, then I added couple of custom log lines to print env vars in ecr-credential-provider in /tmp directory and I could verify that the env vars specified in the credentials spec were indeed being passed.

What I noticed in my KinD setup is that AWS SDK was logging an error because it couldn’t find any “providers”, though I could verify in the kind-worker container that /root/.aws/config file and /root/.aws/credentials were indeed present.

I altered my env vars section of the CredentialProviderConfig as below and it worked for me

    env:
    - name: AWS_PROFILE
      value: "cognito"
    - name: AWS_SDK_LOAD_CONFIG
      value: "1"
    - name: HOME # this is what did the trick for me
      value: "/root"
    - name: PATH
      value: "$PATH:/usr/local/bin:/usr/bin:/usr/local/kubelet"

Apparently, what I noticed is ecr-credential-provider is being passed ONLY the env vars which are present in the spec and everything else was missing! Or at least this is the behaviour in my local cluster setup + KinD cluster.

Can you please add HOME and other env vars necessary for the correct execution of AWS SDK and see if it works for you?

If this doesn’t work, then I’ll send you the modified version of the ecr-credential-provider which puts some additional logs to troubleshoot this issue further.

@n4j Adding the PATH did not work for me.

Jun 17 17:42:02 ecr-kubelet-worker systemd[1]: Starting kubelet: The Kubernetes Node Agent...
Jun 17 17:42:02 ecr-kubelet-worker systemd[1]: Started kubelet: The Kubernetes Node Agent.
Jun 17 17:42:02 ecr-kubelet-worker kubelet[442992]: Flag --fail-swap-on has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Jun 17 17:42:02 ecr-kubelet-worker kubelet[442992]: Flag --feature-gates has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Jun 17 17:42:02 ecr-kubelet-worker kubelet[442992]: Flag --provider-id has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Jun 17 17:42:02 ecr-kubelet-worker kubelet[442992]: Flag --fail-swap-on has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Jun 17 17:42:02 ecr-kubelet-worker kubelet[442992]: Flag --cgroup-root has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Jun 17 17:42:02 ecr-kubelet-worker kubelet[442992]: Flag --fail-swap-on has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Jun 17 17:42:02 ecr-kubelet-worker kubelet[442992]: Flag --feature-gates has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Jun 17 17:42:02 ecr-kubelet-worker kubelet[442992]: Flag --provider-id has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Jun 17 17:42:02 ecr-kubelet-worker kubelet[442992]: Flag --fail-swap-on has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Jun 17 17:42:02 ecr-kubelet-worker kubelet[442992]: Flag --cgroup-root has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Jun 17 17:42:02 ecr-kubelet-worker kubelet[442992]: I0617 17:42:02.572914  442992 server.go:416] Version: v1.20.7
Jun 17 17:42:02 ecr-kubelet-worker kubelet[442992]: I0617 17:42:02.574994  442992 server.go:837] Client rotation is on, will bootstrap in background
Jun 17 17:42:02 ecr-kubelet-worker kubelet[442992]: I0617 17:42:02.587696  442992 certificate_store.go:130] Loading cert/key pair from "/var/lib/kubelet/pki/kubelet-client-current.pem".
Jun 17 17:42:02 ecr-kubelet-worker kubelet[442992]: I0617 17:42:02.589504  442992 dynamic_cafile_content.go:167] Starting client-ca-bundle::/etc/kubernetes/pki/ca.crt
Jun 17 17:42:07 ecr-kubelet-worker kubelet[442992]: W0617 17:42:07.603815  442992 sysinfo.go:203] Nodes topology is not available, providing CPU topology
Jun 17 17:42:07 ecr-kubelet-worker kubelet[442992]: W0617 17:42:07.604901  442992 sysfs.go:348] unable to read /sys/devices/system/cpu/cpu0/online: open /sys/devices/system/cpu/cpu0/online: no such file or directory
Jun 17 17:42:07 ecr-kubelet-worker kubelet[442992]: I0617 17:42:07.609584  442992 container_manager_linux.go:274] container manager verified user specified cgroup-root exists: [kubelet]
Jun 17 17:42:07 ecr-kubelet-worker kubelet[442992]: I0617 17:42:07.609656  442992 container_manager_linux.go:279] Creating Container Manager object based on Node Config: {RuntimeCgroupsName: SystemCgroupsName: KubeletCgroupsName: ContainerRuntime:remote CgroupsPerQOS:true CgroupRoot:/kubelet CgroupDriver:cgroupfs KubeletRootDir:/var/lib/kubelet ProtectKernelDefaults:false NodeAllocatableConfig:{KubeReservedCgroupName: SystemReservedCgroupName: ReservedSystemCPUs: EnforceNodeAllocatable:map[pods:{}] KubeReserved:map[] SystemReserved:map[] HardEvictionThresholds:[]} QOSReserved:map[] ExperimentalCPUManagerPolicy:none ExperimentalTopologyManagerScope:container ExperimentalCPUManagerReconcilePeriod:10s ExperimentalPodPidsLimit:-1 EnforceCPULimits:true CPUCFSQuotaPeriod:100ms ExperimentalTopologyManagerPolicy:none}
Jun 17 17:42:07 ecr-kubelet-worker kubelet[442992]: I0617 17:42:07.609720  442992 topology_manager.go:120] [topologymanager] Creating topology manager with none policy per container scope
Jun 17 17:42:07 ecr-kubelet-worker kubelet[442992]: I0617 17:42:07.609746  442992 container_manager_linux.go:310] [topologymanager] Initializing Topology Manager with none policy and container-level scope
Jun 17 17:42:07 ecr-kubelet-worker kubelet[442992]: I0617 17:42:07.609770  442992 container_manager_linux.go:315] Creating device plugin manager: true
Jun 17 17:42:07 ecr-kubelet-worker kubelet[442992]: I0617 17:42:07.609909  442992 remote_runtime.go:62] parsed scheme: ""
Jun 17 17:42:07 ecr-kubelet-worker kubelet[442992]: I0617 17:42:07.609946  442992 remote_runtime.go:62] scheme "" not registered, fallback to default scheme
Jun 17 17:42:07 ecr-kubelet-worker kubelet[442992]: I0617 17:42:07.610010  442992 passthrough.go:48] ccResolverWrapper: sending update to cc: {[{/run/containerd/containerd.sock  <nil> 0 <nil>}] <nil> <nil>}
Jun 17 17:42:07 ecr-kubelet-worker kubelet[442992]: I0617 17:42:07.610045  442992 clientconn.go:948] ClientConn switching balancer to "pick_first"
Jun 17 17:42:07 ecr-kubelet-worker kubelet[442992]: I0617 17:42:07.610348  442992 remote_image.go:50] parsed scheme: ""
Jun 17 17:42:07 ecr-kubelet-worker kubelet[442992]: I0617 17:42:07.610397  442992 remote_image.go:50] scheme "" not registered, fallback to default scheme
Jun 17 17:42:07 ecr-kubelet-worker kubelet[442992]: I0617 17:42:07.610458  442992 passthrough.go:48] ccResolverWrapper: sending update to cc: {[{/run/containerd/containerd.sock  <nil> 0 <nil>}] <nil> <nil>}
Jun 17 17:42:07 ecr-kubelet-worker kubelet[442992]: I0617 17:42:07.610488  442992 clientconn.go:948] ClientConn switching balancer to "pick_first"
Jun 17 17:42:07 ecr-kubelet-worker kubelet[442992]: I0617 17:42:07.610600  442992 kubelet.go:394] Attempting to sync node with API server
Jun 17 17:42:07 ecr-kubelet-worker kubelet[442992]: I0617 17:42:07.610649  442992 kubelet.go:262] Adding pod path: /etc/kubernetes/manifests
Jun 17 17:42:07 ecr-kubelet-worker kubelet[442992]: I0617 17:42:07.610734  442992 kubelet.go:273] Adding apiserver pod source
Jun 17 17:42:07 ecr-kubelet-worker kubelet[442992]: I0617 17:42:07.610783  442992 apiserver.go:43] Waiting for node sync before watching apiserver pods
Jun 17 17:42:07 ecr-kubelet-worker kubelet[442992]: I0617 17:42:07.615457  442992 kuberuntime_manager.go:216] Container runtime containerd initialized, version: v1.5.1, apiVersion: v1alpha2
Jun 17 17:42:08 ecr-kubelet-worker kubelet[442992]: E0617 17:42:08.348527  442992 aws_credentials.go:77] while getting AWS credentials NoCredentialProviders: no valid providers in chain. Deprecated.
Jun 17 17:42:08 ecr-kubelet-worker kubelet[442992]:         For verbose messaging see aws.Config.CredentialsChainVerboseErrors
Jun 17 17:42:08 ecr-kubelet-worker kubelet[442992]: I0617 17:42:08.350418  442992 server.go:1176] Started kubelet
Jun 17 17:42:08 ecr-kubelet-worker kubelet[442992]: I0617 17:42:08.352360  442992 server.go:148] Starting to listen on 0.0.0.0:10250
Jun 17 17:42:08 ecr-kubelet-worker kubelet[442992]: I0617 17:42:08.357062  442992 fs_resource_analyzer.go:64] Starting FS ResourceAnalyzer
Jun 17 17:42:08 ecr-kubelet-worker kubelet[442992]: I0617 17:42:08.362365  442992 volume_manager.go:271] Starting Kubelet Volume Manager
Jun 17 17:42:08 ecr-kubelet-worker kubelet[442992]: I0617 17:42:08.363289  442992 desired_state_of_world_populator.go:142] Desired state populator starts to run
Jun 17 17:42:08 ecr-kubelet-worker kubelet[442992]: I0617 17:42:08.365796  442992 server.go:410] Adding debug handlers to kubelet server.
Jun 17 17:42:08 ecr-kubelet-worker kubelet[442992]: I0617 17:42:08.367698  442992 client.go:86] parsed scheme: "unix"
Jun 17 17:42:08 ecr-kubelet-worker kubelet[442992]: I0617 17:42:08.367745  442992 client.go:86] scheme "unix" not registered, fallback to default scheme
Jun 17 17:42:08 ecr-kubelet-worker kubelet[442992]: I0617 17:42:08.367863  442992 passthrough.go:48] ccResolverWrapper: sending update to cc: {[{unix:///run/containerd/containerd.sock  <nil> 0 <nil>}] <nil> <nil>}
Jun 17 17:42:08 ecr-kubelet-worker kubelet[442992]: I0617 17:42:08.367906  442992 clientconn.go:948] ClientConn switching balancer to "pick_first"
Jun 17 17:42:08 ecr-kubelet-worker kubelet[442992]: I0617 17:42:08.463987  442992 kuberuntime_manager.go:1006] updating runtime config through cri with podcidr 10.244.1.0/24
Jun 17 17:42:08 ecr-kubelet-worker kubelet[442992]: I0617 17:42:08.466961  442992 kubelet_node_status.go:71] Attempting to register node ecr-kubelet-worker
Jun 17 17:42:08 ecr-kubelet-worker kubelet[442992]: I0617 17:42:08.467682  442992 kubelet_network.go:77] Setting Pod CIDR:  -> 10.244.1.0/24
Jun 17 17:42:08 ecr-kubelet-worker kubelet[442992]: I0617 17:42:08.488026  442992 kubelet_node_status.go:109] Node ecr-kubelet-worker was previously registered
Jun 17 17:42:08 ecr-kubelet-worker kubelet[442992]: I0617 17:42:08.488308  442992 kubelet_node_status.go:74] Successfully registered node ecr-kubelet-worker
Jun 17 17:42:08 ecr-kubelet-worker kubelet[442992]: I0617 17:42:08.494247  442992 setters.go:577] Node became not ready: {Type:Ready Status:False LastHeartbeatTime:2021-06-17 17:42:08.4940374 +0000 UTC m=+6.019140201 LastTransitionTime:2021-06-17 17:42:08.4940374 +0000 UTC m=+6.019140201 Reason:KubeletNotReady Message:[container runtime status check may not have completed yet, PLEG is not healthy: pleg has yet to be successful]}
Jun 17 17:42:08 ecr-kubelet-worker kubelet[442992]: I0617 17:42:08.597806  442992 cpu_manager.go:193] [cpumanager] starting with none policy
Jun 17 17:42:08 ecr-kubelet-worker kubelet[442992]: I0617 17:42:08.597861  442992 cpu_manager.go:194] [cpumanager] reconciling every 10s
Jun 17 17:42:08 ecr-kubelet-worker kubelet[442992]: I0617 17:42:08.597895  442992 state_mem.go:36] [cpumanager] initializing new in-memory state store
Jun 17 17:42:08 ecr-kubelet-worker kubelet[442992]: I0617 17:42:08.598034  442992 state_mem.go:88] [cpumanager] updated default cpuset: ""
Jun 17 17:42:08 ecr-kubelet-worker kubelet[442992]: I0617 17:42:08.598054  442992 state_mem.go:96] [cpumanager] updated cpuset assignments: "map[]"
Jun 17 17:42:08 ecr-kubelet-worker kubelet[442992]: I0617 17:42:08.598086  442992 policy_none.go:43] [cpumanager] none policy: Start
Jun 17 17:42:08 ecr-kubelet-worker kubelet[442992]: W0617 17:42:08.600866  442992 manager.go:594] Failed to retrieve checkpoint for "kubelet_internal_checkpoint": checkpoint is not found
Jun 17 17:42:08 ecr-kubelet-worker kubelet[442992]: I0617 17:42:08.603698  442992 plugin_manager.go:114] Starting Kubelet Plugin Manager
Jun 17 17:42:08 ecr-kubelet-worker kubelet[442992]: I0617 17:42:08.612594  442992 apiserver.go:53] Watching apiserver
Jun 17 17:42:10 ecr-kubelet-worker kubelet[442992]: I0617 17:42:10.712753  442992 kubelet_network_linux.go:56] Initialized IPv4 iptables rules.
Jun 17 17:42:10 ecr-kubelet-worker kubelet[442992]: I0617 17:42:10.712858  442992 status_manager.go:158] Starting to sync pod status with apiserver
Jun 17 17:42:10 ecr-kubelet-worker kubelet[442992]: I0617 17:42:10.712921  442992 kubelet.go:1828] Starting kubelet main sync loop.
Jun 17 17:42:10 ecr-kubelet-worker kubelet[442992]: E0617 17:42:10.713055  442992 kubelet.go:1852] skipping pod synchronization - PLEG is not healthy: pleg has yet to be successful
Jun 17 17:42:10 ecr-kubelet-worker kubelet[442992]: I0617 17:42:10.813844  442992 topology_manager.go:187] [topologymanager] Topology Admit Handler
Jun 17 17:42:10 ecr-kubelet-worker kubelet[442992]: I0617 17:42:10.818993  442992 topology_manager.go:187] [topologymanager] Topology Admit Handler
Jun 17 17:42:10 ecr-kubelet-worker kubelet[442992]: W0617 17:42:10.823658  442992 pod_container_deletor.go:79] Container "7b37db22207fcc990241bf967e38cba672dff9cae490079fe0c3f8cafd4a04c5" not found in pod's containers
Jun 17 17:42:10 ecr-kubelet-worker kubelet[442992]: I0617 17:42:10.840626  442992 reconciler.go:224] operationExecutor.VerifyControllerAttachedVolume started for volume "lib-modules" (UniqueName: "kubernetes.io/host-path/8e759947-b33a-4e80-9a1e-6c5e39f078b3-lib-modules") pod "kindnet-lbzh9" (UID: "8e759947-b33a-4e80-9a1e-6c5e39f078b3")
Jun 17 17:42:10 ecr-kubelet-worker kubelet[442992]: I0617 17:42:10.841468  442992 reconciler.go:224] operationExecutor.VerifyControllerAttachedVolume started for volume "kindnet-token-qk58g" (UniqueName: "kubernetes.io/secret/8e759947-b33a-4e80-9a1e-6c5e39f078b3-kindnet-token-qk58g") pod "kindnet-lbzh9" (UID: "8e759947-b33a-4e80-9a1e-6c5e39f078b3")
Jun 17 17:42:10 ecr-kubelet-worker kubelet[442992]: I0617 17:42:10.841859  442992 reconciler.go:224] operationExecutor.VerifyControllerAttachedVolume started for volume "kube-proxy" (UniqueName: "kubernetes.io/configmap/04093824-c6f7-42be-a9a8-b905cd2a6830-kube-proxy") pod "kube-proxy-42z6n" (UID: "04093824-c6f7-42be-a9a8-b905cd2a6830")
Jun 17 17:42:10 ecr-kubelet-worker kubelet[442992]: I0617 17:42:10.842412  442992 reconciler.go:224] operationExecutor.VerifyControllerAttachedVolume started for volume "xtables-lock" (UniqueName: "kubernetes.io/host-path/04093824-c6f7-42be-a9a8-b905cd2a6830-xtables-lock") pod "kube-proxy-42z6n" (UID: "04093824-c6f7-42be-a9a8-b905cd2a6830")
Jun 17 17:42:10 ecr-kubelet-worker kubelet[442992]: I0617 17:42:10.842892  442992 reconciler.go:224] operationExecutor.VerifyControllerAttachedVolume started for volume "lib-modules" (UniqueName: "kubernetes.io/host-path/04093824-c6f7-42be-a9a8-b905cd2a6830-lib-modules") pod "kube-proxy-42z6n" (UID: "04093824-c6f7-42be-a9a8-b905cd2a6830")
Jun 17 17:42:10 ecr-kubelet-worker kubelet[442992]: I0617 17:42:10.843316  442992 reconciler.go:224] operationExecutor.VerifyControllerAttachedVolume started for volume "kube-proxy-token-4ht2v" (UniqueName: "kubernetes.io/secret/04093824-c6f7-42be-a9a8-b905cd2a6830-kube-proxy-token-4ht2v") pod "kube-proxy-42z6n" (UID: "04093824-c6f7-42be-a9a8-b905cd2a6830")
Jun 17 17:42:10 ecr-kubelet-worker kubelet[442992]: I0617 17:42:10.843627  442992 reconciler.go:224] operationExecutor.VerifyControllerAttachedVolume started for volume "cni-cfg" (UniqueName: "kubernetes.io/host-path/8e759947-b33a-4e80-9a1e-6c5e39f078b3-cni-cfg") pod "kindnet-lbzh9" (UID: "8e759947-b33a-4e80-9a1e-6c5e39f078b3")
Jun 17 17:42:10 ecr-kubelet-worker kubelet[442992]: I0617 17:42:10.845269  442992 reconciler.go:224] operationExecutor.VerifyControllerAttachedVolume started for volume "xtables-lock" (UniqueName: "kubernetes.io/host-path/8e759947-b33a-4e80-9a1e-6c5e39f078b3-xtables-lock") pod "kindnet-lbzh9" (UID: "8e759947-b33a-4e80-9a1e-6c5e39f078b3")
Jun 17 17:42:10 ecr-kubelet-worker kubelet[442992]: I0617 17:42:10.845298  442992 reconciler.go:157] Reconciler: start to sync state
Jun 17 17:42:21 ecr-kubelet-worker kubelet[442992]: I0617 17:42:21.685653  442992 topology_manager.go:187] [topologymanager] Topology Admit Handler
Jun 17 17:42:21 ecr-kubelet-worker kubelet[442992]: I0617 17:42:21.728246  442992 reconciler.go:224] operationExecutor.VerifyControllerAttachedVolume started for volume "default-token-fmsqk" (UniqueName: "kubernetes.io/secret/bb5278c8-fa3a-4b8c-a8af-e5aa57916008-default-token-fmsqk") pod "cats" (UID: "bb5278c8-fa3a-4b8c-a8af-e5aa57916008")
Jun 17 17:42:22 ecr-kubelet-worker kubelet[442992]: E0617 17:42:22.151045  442992 plugin.go:198] Failed getting credential from external registry credential provider: error execing credential provider plugin ecr-credential-provider for image 820537372947.dkr.ecr.us-east-1.amazonaws.com/cats: exit status 1
Jun 17 17:42:22 ecr-kubelet-worker kubelet[442992]: E0617 17:42:22.500764  442992 remote_image.go:113] PullImage "820537372947.dkr.ecr.us-east-1.amazonaws.com/cats:v2" from image service failed: rpc error: code = Unknown desc = failed to pull and unpack image "820537372947.dkr.ecr.us-east-1.amazonaws.com/cats:v2": failed to resolve reference "820537372947.dkr.ecr.us-east-1.amazonaws.com/cats:v2": unexpected status code [manifests v2]: 401 Unauthorized
Jun 17 17:42:22 ecr-kubelet-worker kubelet[442992]: E0617 17:42:22.500940  442992 kuberuntime_image.go:51] Pull image "820537372947.dkr.ecr.us-east-1.amazonaws.com/cats:v2" failed: rpc error: code = Unknown desc = failed to pull and unpack image "820537372947.dkr.ecr.us-east-1.amazonaws.com/cats:v2": failed to resolve reference "820537372947.dkr.ecr.us-east-1.amazonaws.com/cats:v2": unexpected status code [manifests v2]: 401 Unauthorized
Jun 17 17:42:22 ecr-kubelet-worker kubelet[442992]: E0617 17:42:22.501419  442992 kuberuntime_manager.go:829] container &Container{Name:cats,Image:820537372947.dkr.ecr.us-east-1.amazonaws.com/cats:v2,Command:[],Args:[],WorkingDir:,Ports:[]ContainerPort{},Env:[]EnvVar{},Resources:ResourceRequirements{Limits:ResourceList{},Requests:ResourceList{},},VolumeMounts:[]VolumeMount{VolumeMount{Name:default-token-fmsqk,ReadOnly:true,MountPath:/var/run/secrets/kubernetes.io/serviceaccount,SubPath:,MountPropagation:nil,SubPathExpr:,},},LivenessProbe:nil,ReadinessProbe:nil,Lifecycle:nil,TerminationMessagePath:/dev/termination-log,ImagePullPolicy:Always,SecurityContext:nil,Stdin:false,StdinOnce:false,TTY:false,EnvFrom:[]EnvFromSource{},TerminationMessagePolicy:File,VolumeDevices:[]VolumeDevice{},StartupProbe:nil,} start failed in pod cats_default(bb5278c8-fa3a-4b8c-a8af-e5aa57916008): ErrImagePull: rpc error: code = Unknown desc = failed to pull and unpack image "820537372947.dkr.ecr.us-east-1.amazonaws.com/cats:v2": failed to resolve reference "820537372947.dkr.ecr.us-east-1.amazonaws.com/cats:v2": unexpected status code [manifests v2]: 401 Unauthorized
Jun 17 17:42:22 ecr-kubelet-worker kubelet[442992]: E0617 17:42:22.501573  442992 pod_workers.go:191] Error syncing pod bb5278c8-fa3a-4b8c-a8af-e5aa57916008 ("cats_default(bb5278c8-fa3a-4b8c-a8af-e5aa57916008)"), skipping: failed to "StartContainer" for "cats" with ErrImagePull: "rpc error: code = Unknown desc = failed to pull and unpack image \"820537372947.dkr.ecr.us-east-1.amazonaws.com/cats:v2\": failed to resolve reference \"820537372947.dkr.ecr.us-east-1.amazonaws.com/cats:v2\": unexpected status code [manifests v2]: 401 Unauthorized"
Jun 17 17:42:22 ecr-kubelet-worker kubelet[442992]: E0617 17:42:22.780791  442992 pod_workers.go:191] Error syncing pod bb5278c8-fa3a-4b8c-a8af-e5aa57916008 ("cats_default(bb5278c8-fa3a-4b8c-a8af-e5aa57916008)"), skipping: failed to "StartContainer" for "cats" with ImagePullBackOff: "Back-off pulling image \"820537372947.dkr.ecr.us-east-1.amazonaws.com/cats:v2\""
Jun 17 17:42:34 ecr-kubelet-worker kubelet[442992]: E0617 17:42:34.757420  442992 plugin.go:198] Failed getting credential from external registry credential provider: error execing credential provider plugin ecr-credential-provider for image 820537372947.dkr.ecr.us-east-1.amazonaws.com/cats: exit status 1
Jun 17 17:42:35 ecr-kubelet-worker kubelet[442992]: E0617 17:42:35.058457  442992 remote_image.go:113] PullImage "820537372947.dkr.ecr.us-east-1.amazonaws.com/cats:v2" from image service failed: rpc error: code = Unknown desc = failed to pull and unpack image "820537372947.dkr.ecr.us-east-1.amazonaws.com/cats:v2": failed to resolve reference "820537372947.dkr.ecr.us-east-1.amazonaws.com/cats:v2": unexpected status code [manifests v2]: 401 Unauthorized
Jun 17 17:42:35 ecr-kubelet-worker kubelet[442992]: E0617 17:42:35.058819  442992 kuberuntime_image.go:51] Pull image "820537372947.dkr.ecr.us-east-1.amazonaws.com/cats:v2" failed: rpc error: code = Unknown desc = failed to pull and unpack image "820537372947.dkr.ecr.us-east-1.amazonaws.com/cats:v2": failed to resolve reference "820537372947.dkr.ecr.us-east-1.amazonaws.com/cats:v2": unexpected status code [manifests v2]: 401 Unauthorized
Jun 17 17:42:35 ecr-kubelet-worker kubelet[442992]: E0617 17:42:35.059946  442992 kuberuntime_manager.go:829] container &Container{Name:cats,Image:820537372947.dkr.ecr.us-east-1.amazonaws.com/cats:v2,Command:[],Args:[],WorkingDir:,Ports:[]ContainerPort{},Env:[]EnvVar{},Resources:ResourceRequirements{Limits:ResourceList{},Requests:ResourceList{},},VolumeMounts:[]VolumeMount{VolumeMount{Name:default-token-fmsqk,ReadOnly:true,MountPath:/var/run/secrets/kubernetes.io/serviceaccount,SubPath:,MountPropagation:nil,SubPathExpr:,},},LivenessProbe:nil,ReadinessProbe:nil,Lifecycle:nil,TerminationMessagePath:/dev/termination-log,ImagePullPolicy:Always,SecurityContext:nil,Stdin:false,StdinOnce:false,TTY:false,EnvFrom:[]EnvFromSource{},TerminationMessagePolicy:File,VolumeDevices:[]VolumeDevice{},StartupProbe:nil,} start failed in pod cats_default(bb5278c8-fa3a-4b8c-a8af-e5aa57916008): ErrImagePull: rpc error: code = Unknown desc = failed to pull and unpack image "820537372947.dkr.ecr.us-east-1.amazonaws.com/cats:v2": failed to resolve reference "820537372947.dkr.ecr.us-east-1.amazonaws.com/cats:v2": unexpected status code [manifests v2]: 401 Unauthorized
Jun 17 17:42:35 ecr-kubelet-worker kubelet[442992]: E0617 17:42:35.060284  442992 pod_workers.go:191] Error syncing pod bb5278c8-fa3a-4b8c-a8af-e5aa57916008 ("cats_default(bb5278c8-fa3a-4b8c-a8af-e5aa57916008)"), skipping: failed to "StartContainer" for "cats" with ErrImagePull: "rpc error: code = Unknown desc = failed to pull and unpack image \"820537372947.dkr.ecr.us-east-1.amazonaws.com/cats:v2\": failed to resolve reference \"820537372947.dkr.ecr.us-east-1.amazonaws.com/cats:v2\": unexpected status code [manifests v2]: 401 Unauthorized"
Jun 17 17:42:39 ecr-kubelet-worker kubelet[442992]: I0617 17:42:39.442993  442992 reconciler.go:196] operationExecutor.UnmountVolume started for volume "default-token-fmsqk" (UniqueName: "kubernetes.io/secret/bb5278c8-fa3a-4b8c-a8af-e5aa57916008-default-token-fmsqk") pod "bb5278c8-fa3a-4b8c-a8af-e5aa57916008" (UID: "bb5278c8-fa3a-4b8c-a8af-e5aa57916008")
Jun 17 17:42:39 ecr-kubelet-worker kubelet[442992]: I0617 17:42:39.445313  442992 operation_generator.go:797] UnmountVolume.TearDown succeeded for volume "kubernetes.io/secret/bb5278c8-fa3a-4b8c-a8af-e5aa57916008-default-token-fmsqk" (OuterVolumeSpecName: "default-token-fmsqk") pod "bb5278c8-fa3a-4b8c-a8af-e5aa57916008" (UID: "bb5278c8-fa3a-4b8c-a8af-e5aa57916008"). InnerVolumeSpecName "default-token-fmsqk". PluginName "kubernetes.io/secret", VolumeGidValue ""
Jun 17 17:42:39 ecr-kubelet-worker kubelet[442992]: I0617 17:42:39.544298  442992 reconciler.go:319] Volume detached for volume "default-token-fmsqk" (UniqueName: "kubernetes.io/secret/bb5278c8-fa3a-4b8c-a8af-e5aa57916008-default-token-fmsqk") on node "ecr-kubelet-worker" DevicePath ""

I’m happy to send you the cognito binary with a username/password that you can use for testing. I have only tested this with KinD, but I don’t think that is the issue here since it fails with exit code when when I try using exec.Env().

@jicowan Yes, I am planning to setup Cognito Identity pool in my AWS Account.

@jicowan I have already validated that env vars are being passed to ecr-credential-process and there is no issue over there. If that wouldn’t be the case then AWS credentials generation process via profile config wouldn’t work for you.

I had debugged the code of kubelet myself and validated it + I wrote some logs in ecr-credential-process to print out the env vars and it was logging the passed vars.

IMHO, now what remains to be seen here is why it fails with Cognito. I am going to create a test bed tomorrow replicating the exact setup which you have in the mean time if it’s possible for you could you create a Linux VM and replicate your setup inside it?

My intention is to see whether the Cognito based credential process goes through or not in a clean slate environment if it does then something on your local setup is interfering with passed env vars

@adisky Thoughts?

I included my KinD configuration in the beginning of this thread. Can you use that @n4j? The cognito binary has to be built from source, https://github.com/jicowan/cognito (or I can send it to you along with a temp username/password). There is an accompanying blog post for the binary at https://jicowan.medium.com/pulling-docker-images-from-ecr-private-registries-with-a-cognito-user-pool-identity-95f4c233fc4a.

The flow @adisky is as @n4j describes above. The cognito binary returns a json blob to the calling application that looks like this:

{
    "Version": 1,
    "AccessKeyId": "an AWS access key",
    "SecretAccessKey": "your AWS secret access key",
    "SessionToken": "the AWS session token for temporary credentials", 
    "Expiration": "ISO8601 timestamp for when the credentials expire"
}

During my testing (without the kubelet) I am able to get the proper response for the kubelet when referencing the AWS profile using os.Getenv or by setting exec.Env to os.Environ(). However, when I try to pass the AWS profile in using exec.Env(“AWS_PROFILE=default”) I get exit status 1 and no further information. I have a feeling that exec doesn’t properly chain or pass information back to the calling application.

@jicowan isn’t it the issue of ecr-credential-provider? This is the job of ecr-credential-provider to fetch credentials and return result in expected format to kubelet.

When I switch to a profile with credential process I get the following error:

Jun 14 18:33:10 ecr-kubelet-worker kubelet[390834]: I0614 18:33:10.432086  390834 reconciler.go:224] operationExecutor.VerifyControllerAttachedVolume started for volume "default-token-fmsqk" (UniqueName: "kubernetes.io/secret/6dbfb203-81a2-4801-a80b-5e5905c75abd-default-token-fmsqk") pod "cats" (UID: "6dbfb203-81a2-4801-a80b-5e5905c75abd")
Jun 14 18:33:10 ecr-kubelet-worker kubelet[390834]: E0614 18:33:10.982200  390834 plugin.go:198] Failed getting credential from external registry credential provider: error execing credential provider plugin ecr-credential-provider for image 820537372947.dkr.ecr.us-east-1.amazonaws.com/cats: exit status 1
Jun 14 18:33:11 ecr-kubelet-worker kubelet[390834]: E0614 18:33:11.247141  390834 remote_image.go:113] PullImage "820537372947.dkr.ecr.us-east-1.amazonaws.com/cats:v2" from image service failed: rpc error: code = Unknown desc = failed to pull and unpack image "820537372947.dkr.ecr.us-east-1.amazonaws.com/cats:v2": failed to resolve reference "820537372947.dkr.ecr.us-east-1.amazonaws.com/cats:v2": unexpected status code [manifests v2]: 401 Unauthorized

@n4j kubelet registers intree credential providers while startup, you need not to explicitly add any!! https://github.com/kubernetes/kubernetes/blob/master/cmd/kubelet/app/plugins_providers.go#L23-#L25

And I recently tried GCP credential provider, It does fall back to intree GCP credential provider after getting failed response from the one i set up.

@n4j can you check kubelet logs for Failed getting credential from external registry credential provider, It might be possible that is falling back to in-tree aws credential provider.

This is the binary I’m calling with credential_process, https://github.com/jicowan/cognito.

@n4j i am using credential_process, https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-sourcing-external.html in my profile. I am also using KinD for testing, but I don’t think that should matter. I don’t have a credentials file. I only have a config file.

Yes @n4j, it is not passing the Envs to the spawned process. The code for ecr-credential-provider is in the AWS cloud controller repository, https://github.com/kubernetes/cloud-provider-aws/blob/master/cmd/ecr-credential-provider/main.go.