test-infra: gke tests are missing log artifacts
Some upgrade tests are missings logs, making them practically impossible to debug and fix. e.g. https://github.com/kubernetes/kubernetes/issues/50793#issuecomment-326300909
Apparently we’re using the wrong scp command.
/assign @shyamjvs
slack conversation:
2:36 PM]
fejta
yes because a) upgrading will reboot the master and b) we collect logs at the end of the run
[2:37 PM]
@shyamjvs have you given any thoughts to this? Does your log collector help here?
[2:40 PM]
One idea would be make the test which does the upgrade first dump logs if requested. Using a pattern similar to https://github.com/kubernetes/kubernetes/blob/0596891e424b4d9e1858117641cf93c7cf266450/test/e2e/e2e.go#L193
GitHub
kubernetes/kubernetes
kubernetes - Production-Grade Container Scheduling and Management
[2:41 PM]
Alternatively add a dump cluster logs piece:
https://github.com/kubernetes/test-infra/blob/master/kubetest/e2e.go#L217
Before running upgrades:
https://github.com/kubernetes/test-infra/blob/master/kubetest/e2e.go#L164
GitHub
kubernetes/test-infra
test-infra - Test infrastructure for the Kubernetes project.
[3:05 PM]
abgworrall joined #sig-testing.
----- Today September 1st, 2017 -----
[5:50 AM]
shyamjvs @fejta @ericchiang Upgrading the master leading to reboot is not the reason why the logs are failing to be collected
[5:52 AM]
The log-dump script has nothing much to do with master reboot... it's collecting logs from nodes by individually SCP'ing to them
[5:54 AM]
(which can happen even if the master is down (except for logs of the master itself))
[6:09 AM]
shyamjvs The issue is in this line - https://github.com/kubernetes/kubernetes/blob/master/cluster/log-dump/log-dump.sh#L109
GitHub
kubernetes/kubernetes
kubernetes - Production-Grade Container Scheduling and Management
[6:11 AM]
We're setting 'use_custom_instances_list' function for gke, which makes the script use scp. And scp expects an IP address (while we're actually providing the VM name there) (edited)
[6:13 AM]
I'll send a PR to use 'gcloud compute scp' instead of 'scp' when node name is provided instead of IP. That should fix it.
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Reactions: 1
- Comments: 41 (41 by maintainers)
Commits related to this issue
- Merge pull request #51834 from shyamjvs/logdump-for-kubemark Automatic merge from submit-queue Make logdump support kubemark and support gke with 'use_custom_instance_list' Fixes https://github.com... — committed to kubernetes/kubernetes by deleted user 7 years ago
- Merge pull request #52360 from shyamjvs/add-debug-statements Automatic merge from submit-queue (batch tested with PRs 52339, 52343, 52125, 52360, 52301) Make log-dump use 'gcloud ssh' for GKE also ... — committed to kubernetes/kubernetes by deleted user 7 years ago
I’ve got a working fix for this that I tested locally - I’ll send out a PR for it.
Github needs a
For Science !emoji thing 😃