cluster-api: CI failure: capi-e2e-release-1-2-1-22-1-23 and capi-e2e-release-1-2-1-23-1-24 failing consistently

capi-e2e-release-1-2-1-22-1-23 and capi-e2e-release-1-2-1-23-1-24 tests run in release-1.2 branches are failing consistently since December 8th and 9th respectively.

Both tests (prow logs: https://prow.k8s.io/view/gs/kubernetes-jenkins/logs/periodic-cluster-api-e2e-workload-upgrade-1-22-1-23-release-1-2/1602416045765693440 & https://prow.k8s.io/view/gs/kubernetes-jenkins/logs/periodic-cluster-api-e2e-workload-upgrade-1-23-1-24-release-1-2/1602550432566087680) are failing with the exact same error messages:

When upgrading a workload cluster using ClusterClass and testing K8S conformance [Conformance] [K8s-Upgrade] [ClusterClass]
/home/prow/go/src/sigs.k8s.io/cluster-api/test/e2e/cluster_upgrade_test.go:29
  Should create and upgrade a workload cluster and eventually run kubetest [It]
  /home/prow/go/src/sigs.k8s.io/cluster-api/test/e2e/cluster_upgrade.go:118
  Timed out after 1200.003s.
  Expected
      <bool>: false
  to be true
  /home/prow/go/src/sigs.k8s.io/cluster-api/test/framework/daemonset_helpers.go:66
  Full Stack Trace
  sigs.k8s.io/cluster-api/test/framework.WaitForKubeProxyUpgrade({0x252c150?, 0xc0004b2f40}, {{0x7f4de407bce0?, 0xc00040c7e0?}, {0xc00005810e?, 0xc0018d79e0?}}, {0xc0019aa1c0, 0x2, 0x2})
  	/home/prow/go/src/sigs.k8s.io/cluster-api/test/framework/daemonset_helpers.go:66 +0x4ca
  sigs.k8s.io/cluster-api/test/framework.UpgradeClusterTopologyAndWaitForUpgrade({0x252c150?, 0xc0004b2f40}, {{0x2537c58, 0xc00196a980}, 0xc0020f2700, 0xc000744c00, {0xc000054158, 0x7}, {0xc00005840b, 0x6}, ...})
  	/home/prow/go/src/sigs.k8s.io/cluster-api/test/framework/cluster_topology_helpers.go:126 +0x918
  sigs.k8s.io/cluster-api/test/e2e.ClusterUpgradeConformanceSpec.func2()
  	/home/prow/go/src/sigs.k8s.io/cluster-api/test/e2e/cluster_upgrade.go:145 +0x9ac
  github.com/onsi/ginkgo/internal/leafnodes.(*runner).runSync(0x7f4dd4fbcd98?)
  	/home/prow/go/pkg/mod/github.com/onsi/ginkgo@v1.16.5/internal/leafnodes/runner.go:113 +0xb1
  github.com/onsi/ginkgo/internal/leafnodes.(*runner).run(0x0?)
  	/home/prow/go/pkg/mod/github.com/onsi/ginkgo@v1.16.5/internal/leafnodes/runner.go:64 +0x125
  github.com/onsi/ginkgo/internal/leafnodes.(*ItNode).Run(0x0?)
  	/home/prow/go/pkg/mod/github.com/onsi/ginkgo@v1.16.5/internal/leafnodes/it_node.go:26 +0x7b
  github.com/onsi/ginkgo/internal/spec.(*Spec).runSample(0xc0000290e0, 0xc001d49998?, {0x250e4c0, 0xc000066900})
  	/home/prow/go/pkg/mod/github.com/onsi/ginkgo@v1.16.5/internal/spec/spec.go:215 +0x28a
  github.com/onsi/ginkgo/internal/spec.(*Spec).Run(0xc0000290e0, {0x250e4c0, 0xc000066900})
  	/home/prow/go/pkg/mod/github.com/onsi/ginkgo@v1.16.5/internal/spec/spec.go:138 +0xe7
  github.com/onsi/ginkgo/internal/specrunner.(*SpecRunner).runSpec(0xc0003329a0, 0xc0000290e0)
  	/home/prow/go/pkg/mod/github.com/onsi/ginkgo@v1.16.5/internal/specrunner/spec_runner.go:200 +0xe8
  github.com/onsi/ginkgo/internal/specrunner.(*SpecRunner).runSpecs(0xc0003329a0)
  	/home/prow/go/pkg/mod/github.com/onsi/ginkgo@v1.16.5/internal/specrunner/spec_runner.go:170 +0x1a5
  github.com/onsi/ginkgo/internal/specrunner.(*SpecRunner).Run(0xc0003329a0)
  	/home/prow/go/pkg/mod/github.com/onsi/ginkgo@v1.16.5/internal/specrunner/spec_runner.go:66 +0xc5
  github.com/onsi/ginkgo/internal/suite.(*Suite).Run(0xc000198af0, {0x7f4dd46899b8, 0xc000682680}, {0x21c41fb, 0x8}, {0xc0005265c0, 0x2, 0x2}, {0x252db18, 0xc000066900}, ...)
  	/home/prow/go/pkg/mod/github.com/onsi/ginkgo@v1.16.5/internal/suite/suite.go:79 +0x4d2
  github.com/onsi/ginkgo.runSpecsWithCustomReporters({0x25111a0?, 0xc000682680}, {0x21c41fb, 0x8}, {0xc0005265a0, 0x2, 0x21e2821?})
  	/home/prow/go/pkg/mod/github.com/onsi/ginkgo@v1.16.5/ginkgo_dsl.go:245 +0x189
  github.com/onsi/ginkgo.RunSpecsWithDefaultAndCustomReporters({0x25111a0, 0xc000682680}, {0x21c41fb, 0x8}, {0xc00009df10, 0x1, 0x1})
  	/home/prow/go/pkg/mod/github.com/onsi/ginkgo@v1.16.5/ginkgo_dsl.go:228 +0x1b6
  sigs.k8s.io/cluster-api/test/e2e.TestE2E(0x0?)
  	/home/prow/go/src/sigs.k8s.io/cluster-api/test/e2e/e2e_suite_test.go:109 +0x232
  testing.tRunner(0xc000682680, 0x22de800)
  	/usr/local/go/src/testing/testing.go:1439 +0x102
  created by testing.(*T).Run
  	/usr/local/go/src/testing/testing.go:1486 +0x35f
------------------------------
STEP: Dumping logs from the bootstrap cluster
Failed to get logs for the bootstrap cluster node test-xbf4hn-control-plane: exit status 2
STEP: Tearing down the management cluster

This could be related to the recent changes to e2e framework

Environment:

Cluster-api version:
minikube/kind version:
Kubernetes version: (use kubectl version):
OS (e.g. from /etc/os-release):

/kind bug [One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels]

About this issue

Original URL
State: closed
Created 2 years ago
Comments: 15 (15 by maintainers)

Most upvoted comments

https://github.com/kubernetes/test-infra/pull/28243 landed and the CI signal looks good, I think this issue can be closed.

/close

furkatgofurov7 on Dec 16, 2022

Let’s keep this open until we get a signal with the new kubekins images after revert https://github.com/kubernetes/test-infra/pull/28243 lands.

furkatgofurov7 on Dec 14, 2022

This issue is now resolved. TLDR; a registry change fix was missing in the release-1.2 branch. The issue was fixed by merging this cherry-pick https://github.com/kubernetes-sigs/cluster-api/pull/7505.

More details can be found in the CAPI slack discussion.

ykakarap on Dec 14, 2022

I’ve not been able to replicate this failure locally - both upgrade jobs are running successfully on a local CAPD testbed.

Looking again at the failures it seems clear this isn’t a direct issue of CAPI code. The code didn’t change, but the test started failing consistently 5 days ago.

This PR https://github.com/kubernetes/test-infra/commit/64d7bee707522937a12673c173fb6048cb2aa802 updating the kubekins image used in the CAPI test jobs was merged 5 days ago, though, so it seems the most likely candidate is some change in there.

Issue opened at the test-infra repo here: https://github.com/kubernetes/test-infra/issues/28233

killianmuldoon on Dec 13, 2022