kubeadm: upgrade-1-28-latest and upgrade-addons-before-controlplane-1-28-latest failed
https://testgrid.k8s.io/sig-cluster-lifecycle-kubeadm#kubeadm-kinder-upgrade-1-28-latest https://testgrid.k8s.io/sig-cluster-lifecycle-kubeadm#kubeadm-kinder-upgrade-addons-before-controlplane-1-28-latest
keeps failing after https://github.com/kubernetes/release/pull/3254.
/assign
ref https://github.com/kubernetes/kubeadm/issues/2925
See https://github.com/kubernetes/kubeadm/issues/2927#issuecomment-1713870411 for the conclusion.
- add log and change to use universal deserializer:
- add diff log to expose the problem: https://github.com/kubernetes/kubernetes/pull/120541
- cleanup https://github.com/kubernetes/kubernetes/pull/120549 This also changed the unmarshal to use the universal deserializer
-
Option1: if we have to revert(importthe kubernetes/pkg import is not allowed in kubeadm!
)-
revert https://github.com/kubernetes/kubernetes/pull/120554 in v1.29 (master) -
cherry-pick to v1.28 if needed
-
- Option2: if there is another solution to avoid the defaulting logic
- https://github.com/kubernetes/kubernetes/pull/120561
- backport to v1.28 as a bug fix. https://github.com/kubernetes/kubernetes/pull/120605
- try to fix the possible problem for users who have upgraded to v1.28.0-2(need to double-check)
- See https://github.com/kubernetes/kubeadm/issues/2957#issuecomment-1801482442 for more details
About this issue
- Original URL
- State: closed
- Created 10 months ago
- Comments: 71 (71 by maintainers)
The latest diff result(https://storage.googleapis.com/kubernetes-jenkins/logs/ci-kubernetes-e2e-kubeadm-kinder-upgrade-1-28-latest/1701751770478284800/build-log.txt) shows that the default values are no longer injected, this is what we expect.
Now we should fix the v1.28 version, and then the CI will be green. Waiting https://github.com/kubernetes/kubernetes/pull/120605 to be merged.
I think after we remove the reference of
k8s.io/kubernetes/pkg/apis/core/v1
from the v1.28 branch, everything will be back to normal. Because the Pod defaulter is registered into Scheme by: https://github.com/kubernetes/kubernetes/blob/160fe010f32fd1896917fecad680769ad0e40ca0/pkg/apis/core/v1/register.go#L29-L34That’s why the default values are injected…
some testing from me confirms this is really go version related. defaults will always generated with golang 1.20, so that whatever the
k8s.io/kubernetes/pkg/apis/core/v1
is imported or not, diff will always empty as both of them are generated with defaults. defaults will not be generated by golang 1.21 by default, and I suspectkind
update the golang recently, so this issue is hit.some testing in kinder shows the new manifest will not have the defaults generated, but old manifest for each of the pod has the defaults created.
1 and 2 merged. we also updated the diff tool. https://github.com/kubernetes/kubernetes/commits/master
the e2e tests are still failing: https://testgrid.k8s.io/sig-cluster-lifecycle-kubeadm#kubeadm-kinder-upgrade-1-28-latest
https://storage.googleapis.com/kubernetes-jenkins/logs/ci-kubernetes-e2e-kubeadm-kinder-upgrade-1-28-latest/1701661172480086016/build-log.txt
this is before the diff tool update, but we are still getting the strange defaulter diff. our deserializer is now not a converting one and we removed the dep on the internal packages…
so we are preparing some patches for 1. and 2. (noted above) and we are going to apply them to master and then backport 1. to 1.28.
but our upgrade CI is only failing from 1.28 -> master upgrades. the master upgrade is using the 1.29-pre kubeadm binary. this makes the whole problem very confusing.
logically it should have also failed in the 1.27 -> 1.28 upgrades, because 1.28 is when we added the problem code (extra envs). there could be other factors at play here and we might not understand why this is happening exactly…
I added the change of point
2
: using the universal deserializer to decode in https://github.com/kubernetes/kubernetes/pull/120549.I took most of the day to investigate this issue, still have some question mark of my mind,
there is only one change in the
apimachinary
during the period, I am not sure is that related, anyway, I will follow up and see if we can root cause it.No worry. This is an import-cause problem for types registers. It is hard to find out the root cause at first.
I still stuck to reproduce the problem with the test https://github.com/kubernetes/kubernetes/pull/120549/files#diff-95eb7d0b2fcbe4b0c4be43f325c40d36b8c7d1b21d9f83a4330f2aa799f47846R75. https://github.com/kubernetes/kubernetes/pull/120549/files#diff-95eb7d0b2fcbe4b0c4be43f325c40d36b8c7d1b21d9f83a4330f2aa799f47846R264-R266
dnsPolicy
.After I move the test TestCreateLocalEtcdStaticPodManifestFileWithPatches to
cmd/kubeadm/
, it will fail like the TestFunc by @neolit123 in the above comments.https://github.com/kubernetes/kubernetes/compare/master...pacoxu:pacoxu-double-check?expand=1
Add the below import
cmd/kubeadm/app/apis/kubeadm/v1beta4/zz_generated.defaults.go
will fail the test that was written by @neolit123reproduced it minimally, will post it on slack:
commit fd8f2c7fc65b467b1856dc9a83c06d12bd92e586 (HEAD -> master, origin/master, origin/HEAD)
commit 3b874af3878ec7769af3c37afba22fc4d232e57e (HEAD -> release-1.27)
I posted this to #api-machinery slack channel https://kubernetes.slack.com/archives/C0EG7JC6T/p1694264129815039. Let me do some generating test locally.