kubernetes: [Failing Test] timeouts in ci-kubernetes-e2e-gce-scale-performance

Which jobs are failing: ci-kubernetes-e2e-gce-scale-performance

Which test(s) are failing:

testing/density/config.yaml
testing/load/config.yaml
ClusterLoaderV2

Since when has it been failing: Due to previous issues with prow we cannot at this time determine (we are working on resolving this issue in testgrid) the exact moment these tests started failing but they were passing on 5/30 and failed on 6/2.

Testgrid link: https://testgrid.k8s.io/sig-release-master-informing#gce-master-scale-performance

Reason for failure: There failures seem to be related to pods not reaching the desired state within the timeout period.

W0604 03:17:19.141] E0604 03:17:19.140882 12669 wait_for_controlled_pods.go:497] WaitForControlledPodsRunning: test-jfyird-5/saturation-rc-0 timed out

There were also issues with reaching prow:

I0604 03:41:37.997] error dialing prow@35.227.71.123:22: 'dial tcp 35.227.71.123:22: connect: connection timed out', retrying

and

W0604 03:43:44.069] E0604 03:43:44.069271 12669 profile.go:101] failed to gather profile for simple.profileConfig{componentName:"kube-scheduler", provider:"gce", host:"35.227.71.123", kind:"heap"}: failed to execute curl command on master through SSH: error getting SSH client to prow@35.227.71.123:22: 'dial tcp 35.227.71.123:22: connect: connection timed out'

It was mentioned in a previous issue, #76670, that #77127 may be a contributor to failures in this job.

/cc @kubernetes/sig-scalability-test-failures /kind failing-test /priority important-soon /milestone v1.15

/cc @jimangel @smourapina @rarchk @alenkacz /cc @wojtek-t

About this issue

Original URL
State: closed
Created 5 years ago
Comments: 17 (16 by maintainers)

Most upvoted comments

Yesterday’s run, with reverted https://github.com/kubernetes/kubernetes/pull/78465, passed. I believe we can close this one.

MX7fmihkUHP

mm4tt on Jun 14, 2019