kubeadm: kubeadm upgrade from v1.28.0 to v1.28.3 fails
What happened?
The following command
kubeadm upgrade apply v1.28.3 -f --certificate-renewal=true --ignore-preflight-errors='CoreDNSUnsupportedPlugins,Port-6443' --patches=/etc/kubernetes/patches
fails with the error:
[upgrade/apply] FATAL: fatal error when trying to upgrade the etcd cluster, rolled the state back to pre-upgrade state: couldn't upgrade control plane. kubeadm has tried to recover everything into the earlier state. Errors faced: static Pod hash for component etcd on Node qa-fullha-master1 did not change after 5m0s: timed out waiting for the condition
if the kubeadm v1.28.3
The kubeadm v1.28.0 upgrades cluster successfully
What did you expect to happen?
The kubeadm v1.28.3 upgrades cluster successfully
How can we reproduce it (as minimally and precisely as possible)?
Download kubeadm v1.28.3 and run upgrade of Kubernetes v1.28.0
Anything else we need to know?
The issue might be fixed by the --etcd-upgrade flag.
Kubernetes version
$ kubectl version
Client Version: v1.28.3
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.28.3
Cloud provider
Not applicable
OS version
# On Linux:
$ cat /etc/os-release
PRETTY_NAME="Ubuntu 22.04.1 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.1 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy
$ uname -a
Linux ubuntu 5.15.0-43-generic kubernetes/kubernetes#46-Ubuntu SMP Tue Jul 12 10:30:17 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Install tools
Container runtime (CRI) and version (if applicable)
Related plugins (CNI, CSI, …) and versions (if applicable)
About this issue
- Original URL
- State: open
- Created 8 months ago
- Comments: 26 (22 by maintainers)
Commits related to this issue
- Update trouble shooting to include the issue of etcd upgrade For the isse which is reported recently: https://github.com/kubernetes/kubeadm/issues/2957 We'd better to provide some tips to workaround ... — committed to chendave/website by chendave 8 months ago
- Update trouble shooting to include the issue of etcd upgrade For the isse which is reported recently: https://github.com/kubernetes/kubeadm/issues/2957 We'd better to provide some tips to workaround ... — committed to chendave/website by chendave 8 months ago
- Update trouble shooting to include the issue of etcd upgrade For the isse which is reported recently: https://github.com/kubernetes/kubeadm/issues/2957 We'd better to provide some tips to workaround ... — committed to chendave/website by chendave 8 months ago
- Update trouble shooting to include the issue of etcd upgrade For the isse which is reported recently: https://github.com/kubernetes/kubeadm/issues/2957 We'd better to provide some tips to workaround ... — committed to chendave/website by chendave 8 months ago
- Update trouble shooting to include the issue of etcd upgrade For the isse which is reported recently: https://github.com/kubernetes/kubeadm/issues/2957 We'd better to provide some tips to workaround ... — committed to chendave/website by chendave 8 months ago
- Update trouble shooting to include the issue of etcd upgrade For the isse which is reported recently: https://github.com/kubernetes/kubeadm/issues/2957 We'd better to provide some tips to workaround ... — committed to chendave/website by chendave 8 months ago
- Update trouble shooting to include the issue of etcd upgrade For the isse which is reported recently: https://github.com/kubernetes/kubeadm/issues/2957 We'd better to provide some tips to workaround ... — committed to chendave/website by chendave 8 months ago
It applies defaults it knows about, which are defaulting functions registered into the codec. Whether defaulting functions are registered or not depends on which packages are linked into the binary. The defaulting functions for core APIs are defined in k8s.io/kubernetes/… API packages and only intended for use by kube-apiserver
we have exactly the same binary, but i’m getting a different etcd.yaml (minus the IP diff). mine does not have the problematic defaults like
successThreshold: 1,dnsPolicy: ClusterFirst.we saw similar strange behavior when the bug was found.
this means the problem might happen for some 1.28.0 users, but not for others… either way, the workarounds should be applied and there isn’t much we can do like @chendave said.
thanks for the details, let’s keep this tickets open until more users upgrade to 1.28.3. we might have to add an entry about it in: https://k8s-docs.netlify.app/en/docs/setup/production-environment/tools/kubeadm/troubleshooting-kubeadm/
Hi @neolit123 Here it is. kubeadm-config:
init-config:
we patched it for 1.29 here: https://github.com/kubernetes/kubernetes/pull/120561
then we backported it for 1.28 here: https://github.com/kubernetes/kubernetes/pull/120605/commits/0c6a0c3f69bc20bc5422f554475fc260fb700e3b
that was on 14 of Sept and it should be part of 1.28.3, but not in 1.28.2 if i’m reading the history of the branch 1.28 correctly: https://github.com/kubernetes/kubernetes/commits/release-1.28?before=197e7579adb1bf180617bd3becc2aa4dcceb5291+35&branch=release-1.28&qualified_name=refs%2Fheads%2Frelease-1.28
so in theory there should be no problem for the .3 upgrade. but if .4 includes an actual etcd upgrade then there is no way for the hash issue to surface.
thanks for the logs, i will try to reproduce this locally. the workaround like you mentioned is to just skip the etcd upgrade. between 1.28.0 and 1.28.3 there is nothing to upgrade in etcd.
can you share full logs here or in a github Gist maybe? from what version are you upgrading from?
if the flag is set to false, you mean?
we have our 1.27.latest -> 1.28.latest upgrade tests working fine: https://testgrid.k8s.io/sig-cluster-lifecycle-kubeadm#kubeadm-kinder-upgrade-1-27-1-28
but etcd is not upgraded, because there is the same version between 1.27 and 1.28:
https://storage.googleapis.com/kubernetes-jenkins/logs/ci-kubernetes-e2e-kubeadm-kinder-upgrade-1-27-1-28/1721862425331896320/build-log.txt
our 1.28.latest -> 1.29.latest upgrade works as well, where actual etcd upgrade happens: https://testgrid.k8s.io/sig-cluster-lifecycle-kubeadm#kubeadm-kinder-upgrade-1-28-latest
https://storage.googleapis.com/kubernetes-jenkins/logs/ci-kubernetes-e2e-kubeadm-kinder-upgrade-1-28-latest/1721875511665233920/build-log.txt