garden: In-cluster builds on EKS: credentials not found in native keychain

Bug

Current Behavior

In-cluster builds (both Kaniko and cluster-buildkit) are failing on EKS using ECR when trying to build an image running garden build with the following error:

[2022-11-17T15:24:37.385Z] Error: Unable to query registry for image status: time="2022-11-17T15:24:37Z" level=fatal msg="Error parsing image name \"docker://***********.dkr.ecr.us-east-1.amazonaws.com/****/repo/test:v-36b8e5e856\": getting username and password: 1 error occurred:\n\t* credentials not found in native keychain\n\n"

    at skopeoBuildStatus (/snapshot/project/tmp/pkg/cli/node_modules/@garden-io/core/src/plugins/kubernetes/container/build/common.ts:263:13)
    at runMicrotasks (<anonymous>)
    at processTicksAndRejections (internal/process/task_queues.js:95:5)
    at /snapshot/project/tmp/pkg/cli/node_modules/@garden-io/core/src/actions.ts:1303:24
    at ActionRouter.getBuildStatus (/snapshot/project/tmp/pkg/cli/node_modules/@garden-io/core/src/actions.ts:359:20)
    at wrapped.process (/snapshot/project/tmp/pkg/cli/node_modules/@garden-io/core/src/tasks/build.ts:132:22)
    at TaskNode.process (/snapshot/project/tmp/pkg/cli/node_modules/@garden-io/core/src/task-graph.ts:801:20)
    at wrapped.processNode (/snapshot/project/tmp/pkg/cli/node_modules/@garden-io/core/src/task-graph.ts:436:18)

Error Details:

command:
  - skopeo
  - '--command-timeout=30s'
  - inspect
  - '--raw'
  - '--authfile'
  - /.docker/config.json
  - >-
    docker://***********.dkr.ecr.us-east-1.amazonaws.com/****/repo/test:v-36b8e5e856
output: >
  time="2022-11-17T15:24:37Z" level=fatal msg="Error parsing image name
  \"docker://***********.dkr.ecr.us-east-1.amazonaws.com/****/repo/test:v-36b8e5e856\":
  getting username and password: 1 error occurred:\n\t* credentials not found in
  native keychain\n\n"


[2022-11-17T15:24:37.400Z] Error: 1 build action(s) failed!
    at handleProcessResults (/snapshot/project/tmp/pkg/cli/node_modules/@garden-io/core/src/commands/base.ts:532:19)
    at BuildCommand.action (/snapshot/project/tmp/pkg/cli/node_modules/@garden-io/core/src/commands/build.ts:148:32)
    at GardenCli.runCommand (/snapshot/project/tmp/pkg/cli/node_modules/@garden-io/core/src/cli/cli.ts:508:20)
    at GardenCli.run (/snapshot/project/tmp/pkg/cli/node_modules/@garden-io/core/src/cli/cli.ts:667:26)
    at Object.runCli (/snapshot/project/tmp/pkg/cli/src/cli.ts:41:14)

Error Details:

results:
  build.test-image:
    type: build
    description: building test-image
    key: build.test-image
    name: test-image
    error:
      detail:
        command:
          - skopeo
          - '--command-timeout=30s'
          - inspect
          - '--raw'
          - '--authfile'
          - /.docker/config.json
          - >-
            docker://***********.dkr.ecr.us-east-1.amazonaws.com/****/repo/test:v-36b8e5e856
        output: >
          time="2022-11-17T15:24:37Z" level=fatal msg="Error parsing image name
          \"docker://***********.dkr.ecr.us-east-1.amazonaws.com/****/repo/test:v-36b8e5e856\":
          getting username and password: 1 error occurred:\n\t* credentials not
          found in native keychain\n\n"
      type: runtime
    startedAt: '2022-11-17T15:24:35.473Z'
    completedAt: '2022-11-17T15:24:37.326Z'
    batchId: b84dedc4-5292-47e7-ab98-a26b0a8fc485
    version: v-36b8e5e856

My pod has IAM permissions to access ECR as it is being run on an instance that has the correct IAM role, and imagePullSecrets are set up according to instructions.

After some investigation, I found out that the error is being thrown by skopeo on the garden-utils pod, which is currently running a version of amazon-ecr-credential-helper (v0.4.0) that, for some reason, can’t load the instance credentials. I logged in as root into the garden-utils pod and updated amazon-ecr-credential-helper to 0.6.0 and skopeo now works as expected.

Expected behavior

The in-cluster build would work without errors.

Reproducible example

You can use the kaniko example setting it up on EKS using an ECR registry.

Workaround

There are no known workarounds.

Suggested solution(s)

Update garden-util’s (gardendev/k8s-util) image to use the latest version of amazon-ecr-credential-helper (0.6.0) so that it can load the credentials from the instance’s attached IAM role.

Additional context

Your environment

  • OS: macOS Ventura 13.0
  • How I’m running Kubernetes: EKS running Kubernetes 1.23

garden version 0.12.46

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 21 (20 by maintainers)

Commits related to this issue

Most upvoted comments

@Walther @stefreak all working now! Just had the first build using Kaniko with no issues!

Thank you for your efforts in making this work. Happy to collaborate with this great tool in any other way you guys need.

@theoribeiro awesome 😃 Thanks for the offer, any collaboration+feedback is always appreciated! Feel free to reach out on our discord server and/or via github issues if you have any ideas https://discord.gg/gxeuDgp6Xt

I’m working on a more secure IAM setup using IRSA for in-cluster-building, if you want I can share the docs with you once it’s merged, would be very happy if you can try it out and give feedback. Hope I finish it within the next 7 days.

We’re using a simple port forward over ssh with -fN -L 5555:$CLUSTER:443 and setting up /etc/hosts to point $CLUSTER to 127.0.0.1.

That setup works for us for all kubectl commands. We tried using a SOCKS proxy for that but could never get it working properly with kubectl

@stefreak You nailed it! We’re using EKS Managed Node Groups for our nodes and I just checked and they’re indeed using IMDSv2. Since we’re using Terraform to deploy our cluster, I believe the modules are now configured to disable IMDSv1 by default as per an AWS recommendation to conform to security best practices.

@theoribeiro thank you for taking the time to investigate and report this bug!

As I am currently working on enabling IRSA for in-cluster builders (#2931) I am already looking into the relevant code paths. So I’ll look into this issue.