kubernetes: NPD jobs are failing: "failed to push gcr.io/node-problem-detector-staging/ci/node-problem-detector [...] 403 Forbidden"
Which jobs are failing?
Job name | Config source | Testgrid (or job history) link |
---|---|---|
ci-npd-build |
Source | https://testgrid.k8s.io/sig-node-node-problem-detector#ci-npd-build |
pull-npd-e2e-test |
Source | https://prow.k8s.io/job-history/gs/kubernetes-jenkins/pr-logs/directory/pull-npd-e2e-test |
pull-npd-e2e-node |
Source | https://prow.k8s.io/job-history/gs/kubernetes-jenkins/pr-logs/directory/pull-npd-e2e-node |
Which tests are failing?
Jobs fail to start. The container image is built but it fails on push with the following error:
#33 pushing layers
#33 ...
#34 [auth] node-problem-detector-staging/ci/node-problem-detector:pull,push token for gcr.io
#34 DONE 0.0s
#33 exporting to image
#33 pushing layers 1.4s done
#33 ERROR: failed to push gcr.io/node-problem-detector-staging/ci/node-problem-detector:v0.8.13-44-g5558643-20230710.1614: failed to authorize: failed to fetch oauth token: unexpected status: 403 Forbidden
------
> exporting to image:
------
ERROR: failed to solve: failed to push gcr.io/node-problem-detector-staging/ci/node-problem-detector:v0.8.13-44-g5558643-20230710.1614: failed to authorize: failed to fetch oauth token: unexpected status: 403 Forbidden
make: *** [Makefile:270: push-container] Error 1
Since when has it been failing?
2023-07-04 ~12:40 PDT (CI job first was in 2023-07-01 ~06:30 PDT)
Testgrid link
No response
Reason for failure (if possible)
Jobs were migrated to EKS in https://github.com/kubernetes/test-infra/pull/29751, it seems that this is the culprit.
Anything else we need to know?
No response
Relevant SIG(s)
/sig node
About this issue
- Original URL
- State: open
- Created a year ago
- Comments: 15 (15 by maintainers)
Job
pull-npd-e2e-node
is still failing. It seems that the service account in clusterk8s-infra-prow-build
doesn’t have permission to push to bucketgs://node-problem-detector-staging
. We probably want to grant it permissions, since other jobs depend on it (likeci-npd-build
).@rjsadow We don’t want to migrate jobs depending on any GCP resource. It doesn’t make much sense to, let’s say, push an image from EKS to GKE. Running it on EKS would mean higher bandwidth/traffic charges because of transferring data (in this case images) to GKE, and that’s usually much more expensive than just running the job on GCP and uploading from there. That said, I believe those jobs should be reverted back to the GKE cluster.