kubernetes: [Flaking Test] metrics-server not starting in BeforeSuite (ci-kubernetes-e2e-ubuntu-gce)
Which jobs are flaking:
ci-kubernetes-e2e-ubuntu-gce (gce-ubuntu-master-default)
Which test(s) are flaking:
- Kubernetes e2e suite: BeforeSuite
- (There are other flakes in this job, but they are failing less often and different test each time, so let’s say this issue is only for BeforeSuite)
Testgrid link:
https://testgrid.k8s.io/sig-release-master-informing#gce-ubuntu-master-default&width=20
Reason for failure:
metrics-server is not starting:
_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/e2e.go:76
Jul 15 04:34:25.942: Error waiting for all pods to be running and ready: 1 / 31 pods in namespace "kube-system" are NOT in RUNNING and READY state in 10m0s
POD NODE PHASE GRACE CONDITIONS
metrics-server-v0.4.4-6c6b749986-v4wv9 bootstrap-e2e-minion-group-vdwd Running [{Type:Initialized Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2021-07-15 04:22:49 +0000 UTC Reason: Message:} {Type:Ready Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2021-07-15 04:29:57 +0000 UTC Reason:ContainersNotReady Message:containers with unready status: [metrics-server]} {Type:ContainersReady Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2021-07-15 04:29:57 +0000 UTC Reason:ContainersNotReady Message:containers with unready status: [metrics-server]} {Type:PodScheduled Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2021-07-15 04:22:49 +0000 UTC Reason: Message:}]
_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/e2e.go:79
Anything else we need to know:
Spyglass: https://prow.k8s.io/view/gs/kubernetes-jenkins/logs/ci-kubernetes-e2e-ubuntu-gce/1415525375479386112 Triage: https://storage.googleapis.com/k8s-gubernator/triage/index.html?job=ci-kubernetes-e2e-ubuntu-gce
Mentioned in https://github.com/kubernetes/kubernetes/issues/102101#issuecomment-879622365
/cc @aojea
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 16 (16 by maintainers)
As far as I can tell,
metrics-serveris never starting properly, even when the jobs are successful.If we were to take this job, that has succeeded: https://prow.k8s.io/view/gs/kubernetes-jenkins/logs/ci-kubernetes-e2e-ubuntu-gce/1411948153548050432
From the serial logs, we can see that the metrics-server pod is crash looping: https://storage.googleapis.com/kubernetes-jenkins/logs/ci-kubernetes-e2e-ubuntu-gce/1411948153548050432/artifacts/bootstrap-e2e-minion-group-cwh0/serial-1.log
But from the build logs, we can see that at some point the pod was deemed ready: https://storage.googleapis.com/kubernetes-jenkins/logs/ci-kubernetes-e2e-ubuntu-gce/1411948153548050432/build-log.txt
To me it seems that there are 2 issues, one being that metrics-server doesn’t have the capabilities it needs to run and another one that still needs to be investigated where metrics-server is sometimes marked ready even though it is in a crash looping state.
The metrics-server fails to bind ~because there is another process listening~ on port 443 https://storage.googleapis.com/kubernetes-jenkins/logs/ci-kubernetes-e2e-ubuntu-gce/1415525375479386112/build-log.txt
https://github.com/kubernetes/kubernetes/issues/102101#issuecomment-879622365
@cheftako are some of the konnectivity pods listening on the port :443?
https://storage.googleapis.com/kubernetes-jenkins/logs/ci-kubernetes-e2e-ubuntu-gce/1415525375479386112/artifacts/bootstrap-e2e-master/konnectivity-server.log
Yeah, Maybe you can refer to this, the release notes about v0.4.4 metrics-server . I think you have resolved this problem.
https://github.com/kubernetes-sigs/metrics-server/releases
I don’t think so. /cc @dgrisonnet @yangjunmyfm192085