kubernetes: gce-1.9-1.8-downgrade fails due to etcd crashloopbacking

/kind bug /priority failing-test /sig cluster-lifecycle testing

This test fails: https://k8s-testgrid.appspot.com/sig-release-1.9-all#gce-1.9-1.8-downgrade

Some clues:

waiting for new master timeout: https://storage.googleapis.com/kubernetes-jenkins/logs/ci-kubernetes-e2e-gce-beta-stable1-downgrade-cluster/33/build-log.txt

== Waiting for new master to respond to API requests ==
W1204 16:55:41.879] 2017/12/04 16:54:57 util.go:196: Interrupt after 15h0m0s timeout during kubetest --test --test_args=--ginkgo.focus=\[Feature:ClusterDowngrade\] --upgrade-target=ci/k8s-stable1 --upgrade-image=gci --report-dir=/workspace/_artifacts --disable-log-dump=true --report-prefix=upgrade --v=true --check-version-skew=false. Will terminate in another 15m

master healthz check failed: https://storage.googleapis.com/kubernetes-jenkins/logs/ci-kubernetes-e2e-gce-beta-stable1-downgrade-cluster/33/artifacts/bootstrap-e2e-master/kube-apiserver.log

logging error output: "[+]ping ok
[-]etcd failed: reason withheld
[+]poststarthook/generic-apiserver-start-informers ok
[+]poststarthook/start-apiextensions-informers ok
[+]poststarthook/start-apiextensions-controllers ok
[-]poststarthook/bootstrap-controller failed: reason withheld
[-]poststarthook/rbac/bootstrap-roles failed: reason withheld
[-]poststarthook/ca-registration failed: reason withheld
[+]poststarthook/start-kube-apiserver-informers ok
[+]poststarthook/start-kube-aggregator-informers ok
[+]poststarthook/apiservice-registration-controller ok
[+]poststarthook/apiservice-status-available-controller ok
[+]poststarthook/apiservice-openapi-controller ok
[+]poststarthook/kube-apiserver-autoregistration ok
[-]autoregister-completion failed: reason withheld
healthz check failed
"
 [[curl/7.38.0] 35.184.43.102:54282]

etcd is crashloopbacking: https://storage.googleapis.com/kubernetes-jenkins/logs/ci-kubernetes-e2e-gce-beta-stable1-downgrade-cluster/33/artifacts/bootstrap-e2e-master/etcd.log

2017-12-04 02:24:01.495921 I | etcdserver: starting server... [version: 3.0.17, cluster version: to_be_decided]
2017-12-04 02:24:01.506968 I | membership: added member a97d80ddc090126a [https://bootstrap-e2e-master:2380] to cluster 91078bdbe2ed539b
2017-12-04 02:24:01.507103 N | membership: set the initial cluster version to 3.1
2017-12-04 02:24:01.507116 C | membership: cluster cannot be downgraded (current version: 3.0.17 is lower than determined cluster version: 3.1).

I guess the cause may be something associated with etcd version compatibility. @xiang90 @hongchaodeng any thoughts or suggestions? Thanks!

This is also posted in https://groups.google.com/forum/#!topic/kubernetes-sig-cluster-lifecycle/c4MW3R5v4v0

/cc @enisoc @luxas @krzyzacy @kubernetes/sig-cluster-lifecycle-test-failures @kubernetes/sig-release-test-failures

About this issue

Original URL
State: closed
Created 7 years ago
Comments: 23 (23 by maintainers)

Most upvoted comments

I manually verified that the following works as expected to downgrade from 1.9 to 1.8:

ETCD_IMAGE=3.1.10 ETCD_VERSION=3.1.10 gce/upgrade.sh v1.8.5

Before upgrading, my 1.8 cluster had etcd 3.0.17.
After upgrading, my 1.9 cluster had etcd 3.1.10.
After downgrading, my 1.8 cluster had etcd 3.1.10 still.

I’ll work on adding a prompt to gce/upgrade.sh if these are unset.

enisoc on Dec 13, 2017

@xiangpengzhao great question. We would probably need to setup tests to actually test it, but I really believe that it should work fine (k8s 1.8 has already etcd client in 3.1.10 version). And also, we don’t really have big choice here - that’s the only thing we can do/recommend doing.

wojtek-t on Dec 12, 2017