kubernetes: Bind api will bind unexpected pod to node in some situation

What happened:

The api CoreV1().Pods(...).Bind(ctx context.Context, binding *v1.Binding, opts metav1.CreateOptions) bind a pod to a node. But the pod is specified by name (not uid), which is not a unique identifier for pod. So the caller of Bind can’t make sure the pod is exactly what he want.

What you expected to happen:

Bind api can bind a pod with specified uid to a node. If the uid not match, return an error

How to reproduce it (as minimally and precisely as possible):

Create a pending pod by statefulset named busybox, make sure it’s pending, the scheduler will not bind it(use non-existed node selector or other method). So we will get a pending pod busybox-0.
run following code

client, err := kubernetes.NewForConfig(cfg)
if err != nil {
	klog.Fatalf("NewForConfig err: %s", err)
	return
}

pod, err := client.CoreV1().Pods("default").Get(context.TODO(), "busybox-0", v1.GetOptions{})
if err != nil {
	klog.Fatalf("get pod: %s", err)
}

time.Sleep(30 * time.Second)     // !!!! notice here

err = client.CoreV1().Pods(pod.Namespace).Bind(context.TODO(), &corev1.Binding{
	ObjectMeta: v1.ObjectMeta {
		Namespace: pod.Namespace,
		Name: pod.Name,
		UID: pod.UID,
	},
	Target: corev1.ObjectReference{
		Kind:       "Node",
		Name:       "some-node",
	},
}, v1.CreateOptions{})  //

Notice the time.Sleep in the code, delete the pod busybox-0 manually during the sleep（kubectl delete pod busybox-0 or other method）. The statefulset busybox will create another pod busybox-0 with different uid
Finally, the Bind api in the code above will bind an unexpected pod to node

Anything else we need to know?:

The logic in scheduler is the same. If the pod is recreate with the same name and different resource require or selector between scheduler’s Get and Bind, scheduler will do a wrong bind for the pod. I think the root cause is the Bind api does not check the uid.

Environment:

Kubernetes version (use kubectl version): Client Version: version.Info{Major:“1”, Minor:“20+”, GitVersion:“v1.20.0-alpha.3”, GitCommit:“6d3ccd8e6c409a8524145b38517637218cf4b228”, GitTreeState:“clean”, BuildDate:“2020-10-20T09:01:41Z”, GoVersion:“go1.15.2”, Compiler:“gc”, Platform:“darwin/amd64”} Server Version: version.Info{Major:“1”, Minor:“16+”, GitVersion:“v1.16.9-aliyunedge.1”, GitCommit:“687f4f4”, GitTreeState:“”, BuildDate:“2021-03-08T09:56:01Z”, GoVersion:“go1.13.9”, Compiler:“gc”, Platform:“linux/amd64”}
Cloud provider or hardware configuration:
OS (e.g: cat /etc/os-release):
Kernel (e.g. uname -a):
Install tools:
Network plugin and version (if this is a network-related bug):
Others:

About this issue

Original URL
State: closed
Created 3 years ago
Comments: 15 (15 by maintainers)

Most upvoted comments

Either scheduling or node sigs own the pod binding API. Either of those would be a good place to coordinate adding a fix to honor uid as a precondition

API machinery owns the generic API plumbing and API server behavior, not implementations of most specific APIs like the pod or pod binding APIs

/remove-sig api-machinery

liggitt on Oct 23, 2021