test-infra: release-blocking jobs must run in dedicated cluster: ci-kubernetes-build

What should be cleaned up or changed:

This is part of #18549

To properly monitor the outcome of this, you should be a member of k8s-infra-prow-viewers@kubernetes.io. PR yourself into https://github.com/kubernetes/k8s.io/blob/master/groups/groups.yaml#L603-L628 if you’re not a member.

NOTE: I am not tagging this as “help wanted” because it is blocked on https://github.com/kubernetes/k8s.io/issues/846. I would also recommend doing ci-kubernetes-build-fast first. Here is my guess at how we could do this:

  • create a duplicate job that pushes to the new bucket writable by k8s-infra-prow-build
  • ensure it’s building and pushing appropriately
  • update a release-blocking job to pull from the new bucket
  • if no problems, roll out changes progressively
    • a few more jobs in release-blocking
    • all jobs in release-blocking that use this job’s results
    • a job that still runs in the “default” cluster
    • all jobs that use this job’s results
  • rename jobs / get rid of the job that runs on the “default” cluster
  • do the same for release-branch variants, can probably do a faster rollout

It will be helpful to note the date/time that PR’s merge. This will allow you to compare before/after behavior.

Things to watch for the job

Things to watch for the build cluster

Keep this open for at least 24h of weekday PR traffic. If everything continues to look good, then this can be closed.

/wg k8s-infra /sig testing /area jobs

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 24 (24 by maintainers)

Commits related to this issue

Most upvoted comments

identify jobs that depend on gcr.io/kubernetes-ci-images

Based on https://cs.k8s.io/?q=kubernetes-ci-images&i=nope&files=&repos=

  • jobs that use kubeadm that don’t explicitly set ClusterConfiguration.ImageRepository will default to using gcr.io/kubernetes-ci-images if ClusterConfiguration.KubernetesVersion starts with ci/ or ci-cross/

    • could impact kubeadm-kinder jobs
    • could impact cluster-api jobs
    • other jobs that use kubeadm?

kinder downloads image tarbals from e.g.:

https://storage.googleapis.com/kubernetes-release-dev/ci/v1.20.0-beta.2.88+e3de62298a7304/bin/linux/amd64/kube-apiserver.tar

and then mutates them to be k8s.gcr.io/....

so i don’t think kinder (or kubeadm CI) will be affected by the gcr.io/kubernetes-ci-images -> gcr.io/k8-staging-ci-images change. however reading https://github.com/kubernetes/k8s.io/issues/846 my understanding is that kinder needs to migrate away from downloading from gs://kubernetes-release-dev to gs://k8s-release-dev.

should we do this now - i.e. is gs://k8s-release-dev ready for usage?

It is unclear to me whether kubeadm needs to be patched?

  • change hardcoded constant from gcr.io/kubernetes-ci-images to gcr.io/k8-staging-ci-images
  • add flag / config option for CI image repo if we ever rename the repo again
  • make no changes and explicitly set --image-repository

for the gcr.io/kubernetes-ci-images -> gcr.io/k8-staging-ci-images change. it can be considered as a breaking change for the kubeadm API:

  • the correct way would be to make this change as part of a new API version and older versions of the API would have to pass imageRepository (or the flag) explicitly, which is a breaking change to old API users.
  • the easier option is to just change the default in both the older and newer APIs, which is just easier for everyone, but may annoy API purist a little. in any case, it feels like this is the better option.

EDIT, logged: https://github.com/kubernetes/kubeadm/issues/2355 https://github.com/kubernetes/kubeadm/issues/2356