kubernetes: [Failing tests] Multiple job failures on the BeforeSuite step while waiting for CoreDNS to be ready
Which jobs are failing:
Multiple jobs on master-informing.
Which test(s) are failing:
Before Suite on the following jobs:
- https://testgrid.k8s.io/sig-release-master-informing#kubeadm-kinder-upgrade-stable-master
- https://testgrid.k8s.io/sig-release-master-informing#gce-new-master-upgrade-master
- https://testgrid.k8s.io/sig-release-master-informing#gce-new-master-upgrade-master-parallel
- https://testgrid.k8s.io/sig-release-master-informing#gce-new-master-upgrade-cluster-parallel
- https://testgrid.k8s.io/sig-release-master-informing#gce-new-master-upgrade-cluster
- https://testgrid.k8s.io/sig-release-master-informing#gce-master-new-downgrade-cluster
- https://testgrid.k8s.io/sig-release-master-informing#gce-master-new-downgrade-cluster-parallel
- https://testgrid.k8s.io/sig-release-master-informing#gce-new-master-upgrade-cluster-new
- https://testgrid.k8s.io/sig-release-master-informing#gce-new-master-upgrade-cluster-new-parallel
Since when has it been failing: Since 5/17.
Testgrid links: See above.
Reason for failure: All the failures mentioned above seem to have to have the same root cause, namely failure while waiting for CoreDNS to be ready. Example: https://prow.k8s.io/view/gcs/kubernetes-jenkins/logs/ci-kubernetes-e2e-kubeadm-kinder-upgrade-stable-master/1129638974428549121
May 18 07:09:00.075: Error waiting for all pods to be running and ready: 2 / 14 pods in namespace "kube-system" are NOT in RUNNING and READY state in 10m0s
POD NODE PHASE GRACE CONDITIONS
coredns-65546fffc9-j9p6b kinder-upgrade-worker2 Running [{Type:Initialized Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2019-05-18 06:52:59 +0000 UTC Reason: Message:} {Type:Ready Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2019-05-18 06:52:59 +0000 UTC Reason:ContainersNotReady Message:containers with unready status: [coredns]} {Type:ContainersReady Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2019-05-18 06:52:59 +0000 UTC Reason:ContainersNotReady Message:containers with unready status: [coredns]} {Type:PodScheduled Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2019-05-18 06:52:59 +0000 UTC Reason: Message:}]
/sig testing /priority critical-urgent /kind failing test /milestone v1.15 /cc @jimangel @alejandrox1 @rarchk @alenkacz
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 18 (16 by maintainers)
@smourapina thanks for the heads up. @rajansandeep let me know if you don’t have the time for the https://github.com/kubernetes/kubernetes/pull/78033 refactor that i requested and i can try to take over there. the alternative is to rollback the coredns version which i assume is not ideal, due to a variety of fixes in the new version.
I have opened https://github.com/kubernetes/kubernetes/pull/78302 which aims to fix all the failing tests except https://testgrid.k8s.io/sig-release-master-informing#kubeadm-kinder-upgrade-stable-master, which will be fixed via https://github.com/kubernetes/kubernetes/pull/78033.
Some of this (possible all of it) is due to #78030, which is dependent on #78033 being merged.