kubernetes: [Failing Test] (gce-master-scale-performance) Error while dumping cluster logs
Which jobs are failing?
master-informing:
- gce-master-scale-performance
Which tests are failing?
kubetest.DumpClusterLogs
Since when has it been failing?
~https://github.com/kubernetes/kubernetes/pull/121242~
Testgrid link
https://testgrid.k8s.io/sig-release-master-informing#gce-master-scale-performance
Reason for failure (if possible)
I1018 09:46:14.094967 21599 exec_service.go:123] Exec service: tearing down service
2023/10/18 09:46:29 process.go:155: Step '/home/prow/go/src/k8s.io/perf-tests/run-e2e.sh cluster-loader2 --experimental-gcp-snapshot-prometheus-disk=true --experimental-prometheus-disk-snapshot-name=ci-kubernetes-e2e-gce-scale-performance-1714540134923243520 --experimental-prometheus-snapshot-to-report-dir=true --nodes=5000 --prometheus-scrape-node-exporter --provider=gce --report-dir=/logs/artifacts --testconfig=testing/load/config.yaml --testconfig=testing/huge-service/config.yaml --testconfig=testing/access-tokens/config.yaml --testoverrides=./testing/experiments/enable_restart_count_check.yaml --testoverrides=./testing/experiments/ignore_known_gce_container_restarts.yaml --testoverrides=./testing/overrides/5000_nodes.yaml' finished in 1h57m49.081257202s
2023/10/18 09:46:29 e2e.go:569: Dumping logs from nodes to GCS directly at path: gs://k8s-infra-scalability-tests-logs/ci-kubernetes-e2e-gce-scale-performance/1714540134923243520
2023/10/18 09:46:29 process.go:153: Running: /workspace/log-dump.sh /logs/artifacts gs://k8s-infra-scalability-tests-logs/ci-kubernetes-e2e-gce-scale-performance/1714540134923243520
Checking for custom logdump instances, if any
Using gce provider, skipping check for LOG_DUMP_SSH_KEY and LOG_DUMP_SSH_USER
Project: k8s-infra-e2e-scale-5k-project
Network Project: k8s-infra-e2e-scale-5k-project
Zone: us-east1-b
Dumping logs temporarily to '/tmp/tmp.Lh8re5VkdY/logs'. Will upload to 'gs://k8s-infra-scalability-tests-logs/ci-kubernetes-e2e-gce-scale-performance/1714540134923243520' later.
Dumping logs from master locally to '/tmp/tmp.Lh8re5VkdY/logs'
Trying to find master named 'gce-scale-cluster-master'
Looking for address 'gce-scale-cluster-master-ip'
Looking for address 'gce-scale-cluster-master-internal-ip'
Using master: gce-scale-cluster-master (external IP: 104.196.48.60; internal IP: 10.40.0.2)
Changing logfiles to be world-readable for download
Copying 'kube-apiserver.log kube-apiserver-audit.log kube-scheduler.log cloud-controller-manager.log kube-controller-manager.log etcd.log etcd-events.log glbc.log cluster-autoscaler.log kube-addon-manager.log konnectivity-server.log fluentd.log kubelet.cov cl2-* startupscript.log' from gce-scale-cluster-master
Specify --start=117505 in the next get-serial-port-output invocation to get only the new output starting from here.
client_loop: send disconnect: Broken pipe
/usr/bin/scp: Connection closed
ERROR: (gcloud.compute.scp) [/usr/bin/scp] exited with return code [255].
ERROR: (gcloud.compute.scp) Could not fetch resource:
- The resource 'projects/k8s-infra-e2e-scale-5k-project/zones/us-east1-b/instances/gce-scale-cluster-master' was not found
Anything else we need to know?
No response
Relevant SIG(s)
/sig testing cc @kubernetes/ci-signal
About this issue
- Original URL
- State: closed
- Created 8 months ago
- Reactions: 1
- Comments: 31 (29 by maintainers)
/milestone v1.29
CI
gce-cos-master-scalability-100is fixed by https://github.com/kubernetes/test-infra/pull/31197. (recent 4 run: 3 passed.)Thanks @aojea for the comment below.
similar issue observed with another test https://prow.k8s.io/view/gs/kubernetes-jenkins/logs/ci-kubernetes-e2e-gce-scale-correctness/1718311964112850944 but the logs look a bit different, this one is not failing with 404 error
@pacoxu the job still has old version of the
kubekins-e2eimage that holds thelog-dump.shfile from test-infra:gcr.io/k8s-staging-test-infra/kubekins-e2e:v20231015-d38ebb23ab-master.We need to expedite the submission of https://github.com/kubernetes/test-infra/pull/31061 to get more debug logs.