kubernetes: [Failing Test] [sig-apps] ReplicaSet should serve a basic image on each replica with a private image, ReplicationController should serve a basic image on each replica with a private image
Which jobs are failing:
ci-kubernetes-e2e-gci-gce
ci-kubernetes-e2e-gce-cos-k8sbeta-default
Which test(s) are failing:
[sig-apps] ReplicaSet should serve a basic image on each replica with a private image
[sig-apps] ReplicationController should serve a basic image on each replica with a private image
Since when has it been failing: Started failing between 2:04 and 2:40PM PST Dec 1
Testgrid link: https://k8s-testgrid.appspot.com/sig-release-master-blocking#gce-cos-master-default https://k8s-testgrid.appspot.com/sig-release-1.20-blocking#gce-cos-k8sbeta-default
Reason for failure:
pod never run? Looks like both are timing out waiting for containers to be ready
ReplicaSet should serve a basic image on each replica with a private image:
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/apps/replica_set.go:98
Dec 1 22:54:13.321: Unexpected error:
<*errors.errorString | 0xc0036f8ef0>: {
s: "pod \"my-hostname-private-cd2ec0df-be38-465e-a00f-f868f9674320-rknrl\" never run (phase: Pending, conditions: [{Type:Initialized Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2020-12-01 22:49:07 +0000 UTC Reason: Message:} {Type:Ready Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2020-12-01 22:49:07 +0000 UTC Reason:ContainersNotReady Message:containers with unready status: [my-hostname-private-cd2ec0df-be38-465e-a00f-f868f9674320]} {Type:ContainersReady Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2020-12-01 22:49:07 +0000 UTC Reason:ContainersNotReady Message:containers with unready status: [my-hostname-private-cd2ec0df-be38-465e-a00f-f868f9674320]} {Type:PodScheduled Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2020-12-01 22:49:07 +0000 UTC Reason: Message:}]): timed out waiting for the condition",
}
pod "my-hostname-private-cd2ec0df-be38-465e-a00f-f868f9674320-rknrl" never run (phase: Pending, conditions: [{Type:Initialized Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2020-12-01 22:49:07 +0000 UTC Reason: Message:} {Type:Ready Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2020-12-01 22:49:07 +0000 UTC Reason:ContainersNotReady Message:containers with unready status: [my-hostname-private-cd2ec0df-be38-465e-a00f-f868f9674320]} {Type:ContainersReady Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2020-12-01 22:49:07 +0000 UTC Reason:ContainersNotReady Message:containers with unready status: [my-hostname-private-cd2ec0df-be38-465e-a00f-f868f9674320]} {Type:PodScheduled Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2020-12-01 22:49:07 +0000 UTC Reason: Message:}]): timed out waiting for the condition
occurred
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/apps/replica_set.go:156
ReplicationController should serve a basic image on each replica with a private image
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/apps/rc.go:68
Dec 1 23:07:02.794: Unexpected error:
<*errors.errorString | 0xc00348b1f0>: {
s: "pod \"my-hostname-private-3071f600-7524-41d9-b7ea-f7a5cf5011e7-xz94v\" never run (phase: Pending, conditions: [{Type:Initialized Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2020-12-01 23:02:02 +0000 UTC Reason: Message:} {Type:Ready Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2020-12-01 23:02:02 +0000 UTC Reason:ContainersNotReady Message:containers with unready status: [my-hostname-private-3071f600-7524-41d9-b7ea-f7a5cf5011e7]} {Type:ContainersReady Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2020-12-01 23:02:02 +0000 UTC Reason:ContainersNotReady Message:containers with unready status: [my-hostname-private-3071f600-7524-41d9-b7ea-f7a5cf5011e7]} {Type:PodScheduled Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2020-12-01 23:02:02 +0000 UTC Reason: Message:}]): timed out waiting for the condition",
}
pod "my-hostname-private-3071f600-7524-41d9-b7ea-f7a5cf5011e7-xz94v" never run (phase: Pending, conditions: [{Type:Initialized Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2020-12-01 23:02:02 +0000 UTC Reason: Message:} {Type:Ready Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2020-12-01 23:02:02 +0000 UTC Reason:ContainersNotReady Message:containers with unready status: [my-hostname-private-3071f600-7524-41d9-b7ea-f7a5cf5011e7]} {Type:ContainersReady Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2020-12-01 23:02:02 +0000 UTC Reason:ContainersNotReady Message:containers with unready status: [my-hostname-private-3071f600-7524-41d9-b7ea-f7a5cf5011e7]} {Type:PodScheduled Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2020-12-01 23:02:02 +0000 UTC Reason: Message:}]): timed out waiting for the condition
occurred
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/apps/rc.go:459
Anything else we need to know:
Example Spyglass links:
- https://prow.k8s.io/view/gcs/kubernetes-jenkins/logs/ci-kubernetes-e2e-gci-gce/1333903762309255168
- https://prow.k8s.io/view/gcs/kubernetes-jenkins/logs/ci-kubernetes-e2e-gce-cos-k8sbeta-default/1333958121797718016
Having trouble finding a good Triage link - will drop one if I can find
Wondering whether this has anything to do with the Pod pending timeout errors happening on some of the jobs on the 1.20 boards now?
/sig apps /cc @kubernetes/ci-signal @kubernetes/sig-apps-test-failures
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Reactions: 1
- Comments: 21 (21 by maintainers)
@spiffxp – CI Signal should continue to monitor.
I think since @krzyzacy restored the project, y’all are in the clear for the time being. 🙃
/assign @justaugustus @hasheddan (Dan and I will be watching from the shadows. 😃)
I’m starting to see passes for presubmits that were affected by this, looking at https://prow.k8s.io/?repo=kubernetes%2Fkubernetes&job=pull-kubernetes-e2e-gce-ubuntu-containerd
e.g. https://prow.k8s.io/view/gcs/kubernetes-jenkins/pr-logs/pull/97020/pull-kubernetes-e2e-gce-ubuntu-containerd/1334589100509892608/
@spiffxp – Opened one here: https://github.com/kubernetes/k8s.io/issues/1458
https://kubernetes.slack.com/archives/C09QZ4DQB/p1606896985218000
The project hosting the GCR repo was swept up by a security audit because it hadn’t been properly accounted for. That change has been reverted. Now waiting to see affected jobs go back to green.
We should create a community-owned equivalent project, I’ll open a followup issue for that
Now that I have access to the project, I’m working on restoring permissions. I had hoped this would be a 10min fix, but it’s taking longer than I expected. I can currently list the backing bucket, but cannot list images.
@spiffxp I think we likely just need to add permissions to
prow-build@k8s-infra-prow-build.iam.gserviceaccount.comto access the bucket in the restored project where the GCR images are hosted.Hrm, I’m seeing this fail still in downstream repo tests. Are the tests injecting a secret (the only hardcoded GCR secret I see is in
k8s.io/kubernetes/test/e2e/common/runtime.gobut that is not called by those referenced tests), or is the auth rule on this repo limited to a set of projects now vs all projects on GCP before (since these tests passed in our GCP projects yesterday but not now, after access was supposedly restored)?Who is able to access that repo? If it was previously “all projects” then I think that wasn’t restored correctly. https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/24887/pull-ci-openshift-origin-master-e2e-gcp/1334318300561149952 is a 1.19 codebase trying to run in the openshift-gce-devel-ci GCP project, but is getting access denied.
EDIT: This looks like it has started passing again at midnight EST? Maybe some sort of wierd perms propagation issue. DISREGARD
It seems this is the problem
failed to resolve reference "gcr.io/k8s-authenticated-test/agnhost:2.6": failed to authorize: failed to fetch oauth token: unexpected status: 403 Forbidden