kubernetes: Pod that failed to bind, stuck in Pending state forever

Is this a BUG REPORT or FEATURE REQUEST?:

/kind bug

What happened:

Pod got stuck in Pending state forever when failed to bind to host.

What you expected to happen:

For it to be rescheduled on another host

How to reproduce it (as minimally and precisely as possible):

Not always reproducible. But simply created an RC replicas=2 on a 3 host setup, and one of the Pods got stuck in pending state for a while. Following error messages were found in the scheduler logs:

7/20/2017 8:30:40 AME0720 15:30:40.435703       1 scheduler.go:282] Internal error binding pod: (scheduler cache ForgetPod failed: pod test-677550-rc-edit-namespace/nginx-jvn09 state was assumed on a different node)
7/20/2017 8:31:11 AMW0720 15:31:11.119885       1 cache.go:371] Pod test-677550-rc-edit-namespace/nginx-jvn09 expired

Describe pod output:

nginx-jvn09   0/1       Pending   0          2h
> kubectl describe pod nginx-jvn09 --namespace=test-677550-rc-edit-namespace
Name:           nginx-jvn09
Namespace:      test-677550-rc-edit-namespace
Node:           /
Labels:         name=nginx
Annotations:    kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"Replicatio
nController","namespace":"test-677550-rc-edit-namespace","name":"nginx","uid":"6313354d-6d60-11e...
Status:         Pending
IP:
Controllers:    ReplicationController/nginx
Containers:
  nginx:
    Image:              nginx
    Port:               80/TCP
    Environment:        <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-vz8x6 (ro)
Volumes:
  default-token-vz8x6:
    Type:       Secret (a volume populated by a Secret)
    SecretName: default-token-vz8x6
    Optional:   false
QoS Class:      BestEffort
Node-Selectors: <none>
Tolerations:    node.alpha.kubernetes.io/notReady=:Exists:NoExecute for 300s
                node.alpha.kubernetes.io/unreachable=:Exists:NoExecute for 300s
Events:         <none>

RC yml:

apiVersion: v1
kind: ReplicationController
metadata:
  name: nginx
spec:
  replicas: 2
  selector:
    name: nginx
  template:
    metadata:
      labels:
        name: nginx
    spec:
      containers:
        - name: nginx
          image: nginx
          ports:
            - containerPort: 80

Anything else we need to know?:

Environment:

Kubernetes version (use kubectl version): 1.7.1
Cloud provider or hardware configuration**:
OS (e.g. from /etc/os-release): RHEL7, Docker native 1.12.6
Kernel (e.g. uname -a):
Install tools:
Others:

About this issue

Original URL
State: closed
Created 7 years ago
Reactions: 2
Comments: 23 (19 by maintainers)

Links to this issue

Learning to operate Kubernetes reliably

Commits related to this issue

Merge pull request #50028 from julia-stripe/fix-incorrect-scheduler-bind-call Automatic merge from submit-queue Fix incorrect call to 'bind' in scheduler I previously submitted https://github.com/k... — committed to kubernetes/kubernetes by deleted user 7 years ago
Merge pull request #50106 from julia-stripe/improve-scheduler-error-handling Automatic merge from submit-queue Retry scheduling pods after errors more consistently in scheduler **What this PR does ... — committed to kubernetes/kubernetes by deleted user 7 years ago

Most upvoted comments

@julia-stripe @dchen1107 just ran validation test against the patched branch - no pods stuck in Pending any more, so the fix did the job! thank you @julia-stripe

alena1108 on Aug 3, 2017

Seeing this as well.

hypothesis: https://github.com/kubernetes/kubernetes/commit/ecb962e6585#diff-67f2b61521299ca8d8687b0933bbfb19R223 broke the error handling when ForgetPod fails. Before that commit, when ForgetPod failed it would log an error and pass the pod to the error handler (sched.config.Error(pod, err)), which is in charge of retrying scheduling the pod.

After that commit, when ForgetPod fails it skips the error handling and scheduling is never retried.

I’m doing some experiments with patching that and I think there’s more to this bug than just that (for example why is ForgetPod failing in the first place? is the scheduler cache corrupted? why?), but that’s what I’ve got so far.

julia-stripe on Jul 26, 2017