kubernetes: kubelet: Race condition in nodeshutdown unit test
What happened?
There seems to be a race condition in the following unit test
The race happens between read at https://github.com/kubernetes/kubernetes/blob/40c2d049465417f510e4182b05953a49fc5693d4/pkg/kubelet/nodeshutdown/nodeshutdown_manager_linux_test.go#L707
What did you expect to happen?
No race condition
How can we reproduce it (as minimally and precisely as possible)?
cd $KUBE_ROOT/pkg/kubelet/nodeshutdown
go test -c -race
stress ./nodeshutdown.test -test.run ^Test_managerImpl_processShutdownEvent$
Anything else we need to know?
CI logs where race was detected: https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/108039/pull-kubernetes-unit/1491676749694504960
Kubernetes version
$ kubectl version
# paste output here
Cloud provider
OS version
# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here
# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here
Install tools
Container runtime (CRI) and and version (if applicable)
Related plugins (CNI, CSI, …) and versions (if applicable)
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 15 (15 by maintainers)
I don’t see how the klog bump could have made it worse. Perhaps the change around the flush daemon changed some timing conditions, but that’s a rather wild guess. These tests have been faulty all along and need to be fixed.
What klog can do is support unit tests like this better. I’ve opened two issues:
@liggitt I think it’s safe to remove the release-blocker here. The root cause is largely as explained here: https://github.com/kubernetes/kubernetes/issues/108040#issuecomment-1040122707 (appears to be test-only)
A related pr merged https://github.com/kubernetes/kubernetes/pull/107774 @MadhavJivrajani
I think he’s still working on it. The race condition doesn’t happen there because https://github.com/kubernetes/kubernetes/blob/d899c39ca3025362033b5a71e4f27c32690dc78b/staging/src/k8s.io/client-go/transport/round_trippers_test.go#L534
doesn’t start another go routine. In the kubelet’s case however, the write part of the race condition happens in another go routine that is spawned: https://github.com/kubernetes/kubernetes/blob/40c2d049465417f510e4182b05953a49fc5693d4/pkg/kubelet/nodeshutdown/nodeshutdown_manager_linux.go#L303 write: https://github.com/kubernetes/kubernetes/blob/40c2d049465417f510e4182b05953a49fc5693d4/pkg/kubelet/nodeshutdown/nodeshutdown_manager_linux.go#L328