sigstore: Workload Identity Federation is not working with GCP KMS support

Description

Recently, we (w/@dentrax @erkanzileli) added other key management system support to Kyverno while verifying image signatures.^1 Then, I tried this feature on GCP while using GCP KMS and GKE. To achieve this I took advantage of Workload Identity Federation^2. To enable this I’ve used the following commands:

🎗 Cross-ref: https://github.com/kyverno/website/pull/376

$ export PROJECT_ID=$(gcloud config get-value project)
$ export CLUSTER_NAME="gke-wif"
$ gcloud container clusters create $CLUSTER_NAME \
    --workload-pool=$PROJECT_ID.svc.id.goog --num-nodes=2
$ export GSA_NAME=kyverno-sa
$ gcloud iam service-accounts create $GSA_NAME
$ gcloud iam service-accounts add-iam-policy-binding \
  --role roles/iam.workloadIdentityUser \
  --member "serviceAccount:${PROJECT_ID}.svc.id.goog[kyverno/kyverno]" \
  ${GSA_NAME}@${PROJECT_ID}.iam.gserviceaccount.com
$ gcloud projects add-iam-policy-binding ${PROJECT_ID} \
  --role roles/cloudkms.admin \
  --member serviceAccount:${GSA_NAME}@${PROJECT_ID}.iam.gserviceaccount.com
$ kubectl annotate serviceaccount \
  --namespace kyverno \
  kyverno \
  iam.gke.io/gcp-service-account=${GSA_NAME}@${PROJECT_ID}.iam.gserviceaccount.com

Then, I tried it with Kyverno but it didn’t work as I expected. So, I decided to do a small test together with the google/cloud-sdk:slim image. So, I ran a Pod with this image, everything worked fine.

kubectl run -it --rm \
  --image google/cloud-sdk:slim \
  --serviceaccount kyverno \
  --namespace kyverno \
  workload-identity-test

Screen Shot 2021-11-11 at 15 53 53

cc: @JimBugwadia @dlorenc @cpanato

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 28 (19 by maintainers)

Commits related to this issue

Most upvoted comments

I found this but am not sure whether it refers to our problem but this –k8s-keychain flag seems to help us because I found these two blog posts and all of them use this flag to enable workload identity feature, am I right @mattmore @dlorenc?

WOOOOOOOOOOOOOOOOOWW, it worked @JimBugwadia @ribbybibby @dlorenc 🤩

All the problems are related to the KMS roles I’ve configured for the service account, thanks a ton to @ribbybibby for fixing my mistake. When I changed the role from roles/cloudkms.admin to roles/cloudkms.viewer and roles/cloudkms.verifier, all worked properly.

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: check-image
spec:
  validationFailureAction: enforce
  background: false
  webhookTimeoutSeconds: 30
  failurePolicy: Fail
  rules:
    - name: check-image
      match:
        resources:
          kinds:
            - Pod
      verifyImages:
      - image: "gcr.io/shaped-shuttle-342907/alpine:*"
        key: "gcpkms://projects/shaped-shuttle-342907/locations/global/keyRings/test/cryptoKeys/cosign/versions/1"
$ cosign sign --key gcpkms://projects/$PROJECT_ID/locations/global/keyRings/test/cryptoKeys/cosign/versions/1 gcr.io/$PROJECT_ID/alpine:3.15.0

$ kubectl run signed --image=gcr.io/$PROJECT_ID/alpine:3.15.0
pod/signed created

Okay, if it’s a public image then the keychain is probably nothing to do with it.

KMS definitely works for me with workload identity, so I suspect it might be a configuration issue in your environment.

What service account are the kyverno pods using? I notice from your original post that you were attaching workload identity to a service account called ‘kyverno-service-account’ but I think helm usually creates and uses a service account called just ‘kyverno’.

@developer-guy Yes, please 🙇

If it still doesn’t work, then it would nice to have some more details:

  • Any error messages from Kyverno’s logs
  • More information about the image you’re trying to verify. For instance, is it in a private registry, is that Google Container Registry, Artifact Registry or a non-GCP registry?

In 1.6, the cloud provider keychains are only setup when the ‘kubernetes keychain’ is initialized and that only happens if there are image pull secrets specified by the flag -imagePullSecrets.

I think this PR needs to land in a Kyverno release before workload identity will work off the bat: https://github.com/kyverno/kyverno/pull/3116.

@developer-guy a workaround you can use in the meantime is to create an empty image pull secret and use that:

apiVersion: v1
kind: Secret
metadata:
  name: annoying-unnecessary-secret
data:
  # content is {}
  .dockerconfigjson: e30K
type: kubernetes.io/dockerconfigjson

Also, worth noting that the Kyverno build is currently using disable_gcp tag based on prior guidance - could that have any impact? I don’t see any use of this tag in the Sigstore repos. See: kyverno/kyverno@main/Makefile#L28.

I’ve not been able to find disable_gcp string in neither cosign nor sigstore so I’m not sure what that tag is expected to do.

Can you please elaborate on Kyverno is not selecting this provider when using cosign - is there an example of how to select the provider that we can reference? Currently, Kyverno is (indirectly - via Cosign signatures.LoadPublicKey) calling kms.Get to load the providers.

I looked into the cosign sources, while you are pointing to sigstore so I’m not sure what I said make sense. Indeed is most probably wrong. I’ll dig into the sigstore GCP code later.

Hello @lukehinds, as I mentioned above, recently, we’ve added other key supports for verification by using cosign to Kyverno. As you already know, KMS is one of them, so, I wanted to test this by deploying Kyverno on GKE and using GCP KMS for storing keys. But, to access my keys on GCP KMS, Kyverno should use Workload Identity, I did everything to make it work, but it didn’t work. So, I thought that the reason for this is that Kyverno is using sigstore internally for authentication to use GCP services.

I took a look, my initial hypothesis was that the GCP SDK library in use by cosign is not up to date and lacks that functionality.

I say cosign as is the packages used in the linked PR.

The google/cloud-sdk:slim image used SDK version 364.0.0 (extracted from Dockerfile)

cosign is using google.golang.org/api package at version v0.60.0, which as of today is the latest version.

So I would not expect this to be a version related issue.

Following GKE documentation about using Workload Identity from code I suspect there are issues in the authentication code; it may be is not performing authentication is a way that supports metadata authentication, as per GCP docs:

Existing code that authenticates using the instance metadata server (like code using the Google Cloud client libraries) should work without modification.