test-infra: error during gcloud auth activate-service-account --key-file=/etc/service-account/service-account.json: exit status 1

SOLUTION: See https://github.com/kubernetes/test-infra/issues/27157#issuecomment-1318950082 - thanks @chaodaiG !

notes from @liggitt on Nov 17:

Previous Issue body: Example log: https://storage.googleapis.com/kubernetes-jenkins/pr-logs/pull/111859/pull-kubernetes-e2e-gce-storage-slow/1559899720485179392/build-log.txt

This is happening a lot across a variety of CI jobs. See chatter on #testing-ops as well ( https://kubernetes.slack.com/archives/C7J9RP96G/p1660676173294389 )

I0817 13:47:34.328] Call:  gcloud auth activate-service-account --key-file=/etc/service-account/service-account.json
W0817 13:47:34.969] ERROR: (gcloud.auth.activate-service-account) There was a problem refreshing your current auth tokens: ('invalid_grant: Invalid JWT Signature.', {'error': 'invalid_grant', 'error_description': 'Invalid JWT Signature.'})
W0817 13:47:34.969] Please run:
W0817 13:47:34.969] 
W0817 13:47:34.969]   $ gcloud auth login
W0817 13:47:34.969] 
W0817 13:47:34.970] to obtain new credentials.
W0817 13:47:34.970] 
W0817 13:47:34.970] If you have already logged in with a different account:
W0817 13:47:34.970] 
W0817 13:47:34.970]     $ gcloud config set account ACCOUNT
W0817 13:47:34.970] 
W0817 13:47:34.970] to select an already authenticated account to use.

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 40 (38 by maintainers)

Commits related to this issue

Most upvoted comments

I’ve followed https://github.com/kubernetes/test-infra/issues/27157#issuecomment-1318950082 and rotated the keys again. For posterity the steps were:

  1. Go to the kubernetes-jenkins-pull project’s GCP console. Click on IAM & Admin and then Service Accounts. Find the pr-kubekins@kubernetes-jenkins-pull.iam.gserviceaccount.com entry in the list and create a new key for it. Create a new JSON key (private key) for pr-kubekins@kubernetes-jenkins-pull.iam.gserviceaccount.com. This will download the key to your local disk.
  2. Go to the k8s-prow-builds project GCP console. Go to Security -> Secret Manager. Then find default-k8s-build-cluster-service-account-key in the list. Now upload the JSON key from step 1 here as a new version for this secret. Behind the scenes, the kubernetes-external-secrets pod in the k8s-prow cluster will update this secret in O(seconds).

Note that these same steps were automated in https://github.com/kubernetes/test-infra/pull/28053 but the job has been unhealthy: https://prow.k8s.io/job-history/gs/kubernetes-jenkins/logs/ci-test-infra-rotate-legacy-default-build-sa-json-key

I’ll see if I can at least get that job past the exec error.

UPDATE: See https://github.com/kubernetes/test-infra/pull/28786 for the exec error fix.

shadowing what you were doing was good experience @chaodaiG !! appreciate it.

THANK YOU @listx 🙏

I’ve enabled the API in the k8s-prow project and after retrying the job, it succeeded: https://prow.k8s.io/view/gs/kubernetes-jenkins/logs/ci-test-infra-rotate-legacy-default-build-sa-json-key/1630751309038620672

I can see in the Cloud Console that a new key has been created for pr-kubekins@kubernetes-jenkins-pull.iam.gserviceaccount.com. I also see that this key has been loaded up in GCP Secret Manager (as expected).

So it appears that the job performed all of the manual steps I described in https://github.com/kubernetes/test-infra/issues/27157#issuecomment-1435088376.

/remove-lifecycle stale

We’re going to have this problem on a regular basis until we can migrate CI out of google.com, which won’t be anytime this year given the kubernetes.io budget issues.

This appears to be happening again.

See: https://github.com/kubernetes/test-infra/issues/27157#issuecomment-1220982143 for why moving to podutils / workload identity isn’t a workable answer.

[…] but I forget if the churn/noise of key creation is the reason a shared account key was used in the first place.

Yes, that’s the driving reason. Creating a lot of keys was causing issues. E.G. It meant the driver tests were attempting to cleanup keys, and a bug caused the main CI key to be deleted, which was a fun day 🙃

https://github.com/kubernetes/test-infra/issues/27157#issuecomment-1318950082 has the hotfix approach, for someone with access.

@chaodaiG

I’m very curious to understand whether there is any job that has no choice but use this physical service account key file.

https://cs.k8s.io/?q=E2E_GOOGLE_APPLICATION_CREDENTIALS&i=nope&files=&excludeFiles=&repos=

IIRC there are some number of e2e jobs that need to provide a service account key to a gce pd driver deployed to the cluster under test. The clusters these jobs stand up aren’t guaranteed to be GKE clusters, so I’m not sure changing the gce pd driver deployment to use workload identity is an option.

From https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/blob/master/docs/kubernetes/user-guides/driver-install.md#install-driver:

The driver requires a service account that has the following permissions and roles to function properly:

compute.instances.get
compute.instances.attachDisk
compute.instances.detachDisk
roles/compute.storageAdmin
roles/iam.serviceAccountUser

Replacing use of a shared service account key would involve jobs having to run something like the driver’s setup-project.sh script prior to launching tests, which means permission to create a service account and service account keys in each project. I think it’s possible to provide jobs with this privilege via workload identity, but I forget if the churn/noise of key creation is the reason a shared account key was used in the first place.

cc @msau42 who I think is more familiar with this than I am

looks like there are tons of these jobs with that preset - https://cs.k8s.io/?q=preset-service-account&i=nope&files=&excludeFiles=&repos=kubernetes/test-infra

this is not surprising. Having someone to remember to manually rotate this every 80 days doesn’t seem like a sustainable solution, so at this point I’m very curious to understand whether there is any job that has no choice but use this physical service account key file.

The second goal, is to figure out whether all these jobs are maintained or not

@chaodaiG looks like there are tons of these jobs with that preset - https://cs.k8s.io/?q=preset-service-account&i=nope&files=&excludeFiles=&repos=kubernetes/test-infra

Let me start with just the ones in k8s-cri-containerd project used by containerd.