kubernetes: Bind api will bind unexpected pod to node in some situation
What happened:
The api CoreV1().Pods(...).Bind(ctx context.Context, binding *v1.Binding, opts metav1.CreateOptions) bind a pod to a node. But the pod is specified by name (not uid), which is not a unique identifier for pod. So the caller of Bind can’t make sure the pod is exactly what he want.
What you expected to happen:
Bind api can bind a pod with specified uid to a node. If the uid not match, return an error
How to reproduce it (as minimally and precisely as possible):
- Create a pending pod by statefulset named busybox, make sure it’s pending, the scheduler will not bind it(use non-existed node selector or other method). So we will get a pending pod busybox-0.
- run following code
client, err := kubernetes.NewForConfig(cfg)
if err != nil {
klog.Fatalf("NewForConfig err: %s", err)
return
}
pod, err := client.CoreV1().Pods("default").Get(context.TODO(), "busybox-0", v1.GetOptions{})
if err != nil {
klog.Fatalf("get pod: %s", err)
}
time.Sleep(30 * time.Second) // !!!! notice here
err = client.CoreV1().Pods(pod.Namespace).Bind(context.TODO(), &corev1.Binding{
ObjectMeta: v1.ObjectMeta {
Namespace: pod.Namespace,
Name: pod.Name,
UID: pod.UID,
},
Target: corev1.ObjectReference{
Kind: "Node",
Name: "some-node",
},
}, v1.CreateOptions{}) //
- Notice the
time.Sleepin the code, delete the pod busybox-0 manually during the sleep(kubectl delete pod busybox-0or other method). The statefulset busybox will create another pod busybox-0 with different uid - Finally, the
Bindapi in the code above will bind an unexpected pod to node
Anything else we need to know?:
The logic in scheduler is the same. If the pod is recreate with the same name and different resource require or selector between scheduler’s Get and Bind, scheduler will do a wrong bind for the pod. I think the root cause is the Bind api does not check the uid.
Environment:
-
Kubernetes version (use
kubectl version): Client Version: version.Info{Major:“1”, Minor:“20+”, GitVersion:“v1.20.0-alpha.3”, GitCommit:“6d3ccd8e6c409a8524145b38517637218cf4b228”, GitTreeState:“clean”, BuildDate:“2020-10-20T09:01:41Z”, GoVersion:“go1.15.2”, Compiler:“gc”, Platform:“darwin/amd64”} Server Version: version.Info{Major:“1”, Minor:“16+”, GitVersion:“v1.16.9-aliyunedge.1”, GitCommit:“687f4f4”, GitTreeState:“”, BuildDate:“2021-03-08T09:56:01Z”, GoVersion:“go1.13.9”, Compiler:“gc”, Platform:“linux/amd64”} -
Cloud provider or hardware configuration:
-
OS (e.g:
cat /etc/os-release): -
Kernel (e.g.
uname -a): -
Install tools:
-
Network plugin and version (if this is a network-related bug):
-
Others:
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 15 (15 by maintainers)
Either scheduling or node sigs own the pod binding API. Either of those would be a good place to coordinate adding a fix to honor uid as a precondition
API machinery owns the generic API plumbing and API server behavior, not implementations of most specific APIs like the pod or pod binding APIs
/remove-sig api-machinery