kubernetes: Pet sets stuck in Init state though the pods are running

Hit an issue with the pet set states switching from running state to init state. Initially I thought the pods aren’t running but later figured out that the containers are actually running but the container state is not consistent with the pod state in kubernetes api.

Issue

Container State: Running
Pod State: Pending

Kubernetes Cluster Version

kubectl version
Client Version: version.Info{Major:"1", Minor:"3", GitVersion:"v1.3.3", GitCommit:"c6411395e09da356c608896d3d9725acab821418", GitTreeState:"clean", BuildDate:"2016-07-22T20:29:38Z", GoVersion:"go1.6.2", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"3", GitVersion:"v1.3.3", GitCommit:"c6411395e09da356c608896d3d9725acab821418", GitTreeState:"clean", BuildDate:"2016-07-22T20:22:25Z", GoVersion:"go1.6.2", Compiler:"gc", Platform:"linux/amd64"}

Kubernetes Get Pod State

kubectl get po -l app=zk
NAME      READY     STATUS     RESTARTS   AGE
zoo-0     0/1       Init:0/2   0          13h
zoo-1     0/1       Init:0/2   0          13h
zoo-2     0/1       Init:0/2   0          13h

Kubernetes Describe Pod State

kubectl describe po zoo-0
Name:       zoo-0
Namespace:  default
Node:       ip-1-2-3-4.region.compute.internal/10.1.2.3
Start Time: Tue, 02 Aug 2016 20:33:34 +0400
Labels:     app=zk
        name=zoo
Status:     Pending
IP:     10.24.1.2
Controllers:    PetSet/zoo
Init Containers:
  install:
    Container ID:   
    Image:      gcr.io/google_containers/zookeeper-install:0.1
    Image ID:       
    Port:       
    Args:
      --version=3.5.0-alpha
      --install-into=/opt
      --work-dir=/work-dir
    State:          Waiting
      Reason:           PodInitializing
    Ready:          False
    Restart Count:      0
    Environment Variables:  <none>
  bootstrap:
    Container ID:   
    Image:      java:openjdk-8-jre
    Image ID:       
    Port:       
    Command:
      /work-dir/peer-finder
    Args:
      -on-start="/work-dir/on-start.sh"
      -service=zk
    State:      Waiting
      Reason:       PodInitializing
    Ready:      False
    Restart Count:  0
    Environment Variables:
      POD_NAMESPACE:    default (v1:metadata.namespace)
Containers:
  zk:
    Container ID:   docker://4a2e193fa6c86559c9387a0ee473596be59cff2fb7dcd10ca9bb08c9918e6d13
    Image:      java:openjdk-8-jre
    Image ID:       docker://sha256:372859dd1c695759fe765be375346390ddd393f76fa84319630d1d64b85b9806
    Ports:      2888/TCP, 3888/TCP
    Command:
      /opt/zookeeper/bin/zkServer.sh
    Args:
      start-foreground
    State:          Running
      Started:          Tue, 02 Aug 2016 20:33:58 +0400
    Ready:          True
    Restart Count:      0
    Readiness:          exec [sh -c /opt/zookeeper/bin/zkCli.sh ls /] delay=15s timeout=5s period=10s #success=1 #failure=3
    Environment Variables:  <none>
Conditions:
  Type      Status
  Initialized   False 
  Ready     True 
  PodScheduled  True 
Volumes:
  datadir:
    Type:   PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  datadir-zoo-0
    ReadOnly:   false
  workdir:
    Type:   EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium: 
  opt:
    Type:   EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium: 
QoS Tier:   BestEffort
No events.

Container State

core@ip-1-2-3-4 ~ $ docker ps | grep zoo-0
4a2e193fa6c8        java:openjdk-8-jre                         "/opt/zookeeper/bin/z"   13 hours ago        Up 13 hours                                                    k8s_zk.484fe5b9_zoo-0_default_db0ce747-58ce-11e6-8261-02548dfad2e5_4424c4ea
1b74ed7351fd        gcr.io/google_containers/pause-amd64:3.0   "/pause"                 13 hours ago        Up 13 hours                                                    k8s_POD.85382e2a_zoo-0_default_db0ce747-58ce-11e6-8261-02548dfad2e5_18dc3c54

Node State

System Info:
 Machine ID:            ID
 System UUID:           UUID
 Boot ID:           BOOT_ID
 Kernel Version:        4.6.3-coreos
 OS Image:          CoreOS 1068.8.0 (MoreOS)
 Operating System:      linux
 Architecture:          amd64
 Container Runtime Version: docker://1.10.3
 Kubelet Version:       v1.3.3
 Kube-Proxy Version:        v1.3.3

About this issue

Original URL
State: closed
Created 8 years ago
Reactions: 12
Comments: 46 (23 by maintainers)

Links to this issue

docker - Kubernetes pods hanging in Init state - Stack Overflow

Commits related to this issue

Merge pull request #51644 from sjenning/init-container-status-fix Automatic merge from submit-queue (batch tested with PRs 51239, 51644, 52076) do not update init containers status if terminated fi... — committed to kubernetes/kubernetes by deleted user 7 years ago

Most upvoted comments

Just did a cluster upgrade to v1.7.2 and this is still happening. I can confirm we have a number of pods with initcontainers created by a daemonset that ran their initcontainer command successfully and are running their container command but kubectl reports them in Init:0/1.

+13

james-atwill-hs on Aug 3, 2017

Here’s what I’ve been using to fix up pods in our system, YMMV:

#!/bin/bash
if [[ $# -ne 2 ]]; then
  echo 'Usage: ./fix-pod.sh POD_NAME INIT_CONTAINER_NAME'
  exit 1
fi
POD_NAME=$1
INIT_CONTAINER_NAME=$2
NEW_CONTAINER_NAME=$(docker ps | grep $POD_NAME | head -n 1 | tr -s ' ' | sed 's/ /\n/g' | tail -n 1 | sed "s/k8s_[^.]\+\./k8s_$INIT_CONTAINER_NAME\./")
docker run --entrypoint /bin/bash --name $NEW_CONTAINER_NAME ubuntu:16.04

First arg is the pod name (e.g. web-server-551139053-x7cmb) Second arg is the name of the init container in the pod that is supposed to be started, (e.g. web-config-fetch). Seems to have fixed up everything so far, hope that helps someone.

pmcq on Apr 28, 2017

Hi,

I’ve got the exact same issue with a Deployment and a initContainer exiting correctly.

NAME                                      READY     STATUS     RESTARTS   AGE
azerty-master-webserver-959416056-dgqqf   0/2       Init:0/1   0          7m

Impact could be cosmetic for some (pods still running and functioning properly) or functional for the others 😃 (scripts / automations trusting the status state)

Regards,

M0nsieurChat on Jul 28, 2017

We are having the same problem in one of our clusters. Steps to reproduce:

Create new Deployment with one init_container.
Wait for it to be in Running state (init_container completed successfully and main pod is up and running)
Delete init container manually using docker rm command. (When we first encountered this problem, the init container was deleted by kubernetes garbage collector).
Main pod will switch to Initializing state and will be in this state indefinitely. Docker container will be running.

We are running kubernetes 1.6.3.

andreychernih on May 30, 2017

What’s the status on this? It seems like a major bug, I’m not sure how it has gone on for so long. When impacted by this bug, scaling, restarting, etc of pods with higher ordinals do not work on StatefulSet. That is, if you have 3 pods for your StatefulSet, and they go into Init: 0/1 state and you lose mypod-2, kubernetes will never replace it until you manually kill/restart mypod-0 and mypod-1 to remove the Init: 0/1 state.

bbeaudreault on Apr 24, 2017

While a fix is being worked on, the above workarounds have worked for us for 1.5 months. Look into anything that might be deleting/garbagecollecting the containers after the init containers have executed. Disable that if possible, and you’ll be unlikely to see this again.

bbeaudreault on Jun 13, 2017

@jad007 what do you mean “when the coreOS cluster kubernetes is running on updates itself”? Do you mean in-place upgrade Kubelet and docker binaries on the node? I noticed all following failure cases are reported on CoreOS. That might be the cause here.

dchen1107 on Mar 10, 2017

As an update to my previous comment, the pods start up in the expected state, so they will be in the “Running” phase and “ready=true”. However, when the pod is restarted (for example when the coreOS cluster kubernetes is running on updates itself), the pods start up in the bad state with the “Pending” phase, but the container is running and ready.

jad007 on Mar 1, 2017

We have also seen this problem with pods. in 1.3.9.

The pod has a init container. It looks like the init container actually completes successfully and later on the container starts and runs correctly ( with the output of the init container). However the pod state is stuck at pending.

I will follow up with more information and try to isolate the problem in a way that I can share in this thread.

ahakanbaba on Dec 20, 2016