test-infra: gke tests are missing log artifacts

Some upgrade tests are missings logs, making them practically impossible to debug and fix. e.g. https://github.com/kubernetes/kubernetes/issues/50793#issuecomment-326300909

Apparently we’re using the wrong scp command.

/assign @shyamjvs

slack conversation:

2:36 PM] 
fejta
yes because a) upgrading will reboot the master and b) we collect logs at the end of the run


[2:37 PM] 
@shyamjvs have you given any thoughts to this? Does your log collector help here?


[2:40 PM] 
One idea would be make the test which does the upgrade first dump logs if requested. Using a pattern similar to https://github.com/kubernetes/kubernetes/blob/0596891e424b4d9e1858117641cf93c7cf266450/test/e2e/e2e.go#L193
GitHub
kubernetes/kubernetes
kubernetes - Production-Grade Container Scheduling and Management
 


[2:41 PM] 
Alternatively add a dump cluster logs piece:
https://github.com/kubernetes/test-infra/blob/master/kubetest/e2e.go#L217

Before running upgrades: 
https://github.com/kubernetes/test-infra/blob/master/kubetest/e2e.go#L164
GitHub
kubernetes/test-infra
test-infra - Test infrastructure for the Kubernetes project.
 


[3:05 PM] 
abgworrall joined #sig-testing.



----- Today September 1st, 2017 -----
[5:50 AM] 
shyamjvs @fejta @ericchiang Upgrading the master leading to reboot is not the reason why the logs are failing to be collected


[5:52 AM] 
The log-dump script has nothing much to do with master reboot... it's collecting logs from nodes by individually SCP'ing to them


[5:54 AM] 
(which can happen even if the master is down (except for logs of the master itself))


[6:09 AM] 
shyamjvs The issue is in this line - https://github.com/kubernetes/kubernetes/blob/master/cluster/log-dump/log-dump.sh#L109
GitHub
kubernetes/kubernetes
kubernetes - Production-Grade Container Scheduling and Management
 


[6:11 AM] 
We're setting 'use_custom_instances_list' function for gke, which makes the script use scp. And scp expects an IP address (while we're actually providing the VM name there) (edited)


[6:13 AM] 
I'll send a PR to use 'gcloud compute scp' instead of 'scp' when node name is provided instead of IP. That should fix it.

About this issue

Original URL
State: closed
Created 7 years ago
Reactions: 1
Comments: 41 (41 by maintainers)

Commits related to this issue

Merge pull request #51834 from shyamjvs/logdump-for-kubemark Automatic merge from submit-queue Make logdump support kubemark and support gke with 'use_custom_instance_list' Fixes https://github.com... — committed to kubernetes/kubernetes by deleted user 7 years ago
Merge pull request #52360 from shyamjvs/add-debug-statements Automatic merge from submit-queue (batch tested with PRs 52339, 52343, 52125, 52360, 52301) Make log-dump use 'gcloud ssh' for GKE also ... — committed to kubernetes/kubernetes by deleted user 7 years ago

Most upvoted comments

I’ve got a working fix for this that I tested locally - I’ll send out a PR for it.

shyamjvs on Sep 12, 2017

Github needs a For Science ! emoji thing 😃

abgworrall on Sep 12, 2017