kubernetes: Pet sets stuck in Init state though the pods are running
Hit an issue with the pet set states switching from running state to init state. Initially I thought the pods aren’t running but later figured out that the containers are actually running but the container state is not consistent with the pod state in kubernetes api.
Issue
Container State: Running
Pod State: Pending
Kubernetes Cluster Version
kubectl version
Client Version: version.Info{Major:"1", Minor:"3", GitVersion:"v1.3.3", GitCommit:"c6411395e09da356c608896d3d9725acab821418", GitTreeState:"clean", BuildDate:"2016-07-22T20:29:38Z", GoVersion:"go1.6.2", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"3", GitVersion:"v1.3.3", GitCommit:"c6411395e09da356c608896d3d9725acab821418", GitTreeState:"clean", BuildDate:"2016-07-22T20:22:25Z", GoVersion:"go1.6.2", Compiler:"gc", Platform:"linux/amd64"}
Kubernetes Get Pod State
kubectl get po -l app=zk
NAME READY STATUS RESTARTS AGE
zoo-0 0/1 Init:0/2 0 13h
zoo-1 0/1 Init:0/2 0 13h
zoo-2 0/1 Init:0/2 0 13h
Kubernetes Describe Pod State
kubectl describe po zoo-0
Name: zoo-0
Namespace: default
Node: ip-1-2-3-4.region.compute.internal/10.1.2.3
Start Time: Tue, 02 Aug 2016 20:33:34 +0400
Labels: app=zk
name=zoo
Status: Pending
IP: 10.24.1.2
Controllers: PetSet/zoo
Init Containers:
install:
Container ID:
Image: gcr.io/google_containers/zookeeper-install:0.1
Image ID:
Port:
Args:
--version=3.5.0-alpha
--install-into=/opt
--work-dir=/work-dir
State: Waiting
Reason: PodInitializing
Ready: False
Restart Count: 0
Environment Variables: <none>
bootstrap:
Container ID:
Image: java:openjdk-8-jre
Image ID:
Port:
Command:
/work-dir/peer-finder
Args:
-on-start="/work-dir/on-start.sh"
-service=zk
State: Waiting
Reason: PodInitializing
Ready: False
Restart Count: 0
Environment Variables:
POD_NAMESPACE: default (v1:metadata.namespace)
Containers:
zk:
Container ID: docker://4a2e193fa6c86559c9387a0ee473596be59cff2fb7dcd10ca9bb08c9918e6d13
Image: java:openjdk-8-jre
Image ID: docker://sha256:372859dd1c695759fe765be375346390ddd393f76fa84319630d1d64b85b9806
Ports: 2888/TCP, 3888/TCP
Command:
/opt/zookeeper/bin/zkServer.sh
Args:
start-foreground
State: Running
Started: Tue, 02 Aug 2016 20:33:58 +0400
Ready: True
Restart Count: 0
Readiness: exec [sh -c /opt/zookeeper/bin/zkCli.sh ls /] delay=15s timeout=5s period=10s #success=1 #failure=3
Environment Variables: <none>
Conditions:
Type Status
Initialized False
Ready True
PodScheduled True
Volumes:
datadir:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: datadir-zoo-0
ReadOnly: false
workdir:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
opt:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
QoS Tier: BestEffort
No events.
Container State
core@ip-1-2-3-4 ~ $ docker ps | grep zoo-0
4a2e193fa6c8 java:openjdk-8-jre "/opt/zookeeper/bin/z" 13 hours ago Up 13 hours k8s_zk.484fe5b9_zoo-0_default_db0ce747-58ce-11e6-8261-02548dfad2e5_4424c4ea
1b74ed7351fd gcr.io/google_containers/pause-amd64:3.0 "/pause" 13 hours ago Up 13 hours k8s_POD.85382e2a_zoo-0_default_db0ce747-58ce-11e6-8261-02548dfad2e5_18dc3c54
Node State
System Info:
Machine ID: ID
System UUID: UUID
Boot ID: BOOT_ID
Kernel Version: 4.6.3-coreos
OS Image: CoreOS 1068.8.0 (MoreOS)
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://1.10.3
Kubelet Version: v1.3.3
Kube-Proxy Version: v1.3.3
About this issue
- Original URL
- State: closed
- Created 8 years ago
- Reactions: 12
- Comments: 46 (23 by maintainers)
Just did a cluster upgrade to v1.7.2 and this is still happening. I can confirm we have a number of pods with initcontainers created by a daemonset that ran their initcontainer command successfully and are running their container command but kubectl reports them in Init:0/1.
Here’s what I’ve been using to fix up pods in our system, YMMV:
First arg is the pod name (e.g.
web-server-551139053-x7cmb
) Second arg is the name of the init container in the pod that is supposed to be started, (e.g.web-config-fetch
). Seems to have fixed up everything so far, hope that helps someone.Hi,
I’ve got the exact same issue with a Deployment and a initContainer exiting correctly.
Impact could be cosmetic for some (pods still running and functioning properly) or functional for the others 😃 (scripts / automations trusting the status state)
Regards,
We are having the same problem in one of our clusters. Steps to reproduce:
docker rm
command. (When we first encountered this problem, the init container was deleted by kubernetes garbage collector).We are running kubernetes 1.6.3.
What’s the status on this? It seems like a major bug, I’m not sure how it has gone on for so long. When impacted by this bug, scaling, restarting, etc of pods with higher ordinals do not work on StatefulSet. That is, if you have 3 pods for your StatefulSet, and they go into
Init: 0/1
state and you lose mypod-2, kubernetes will never replace it until you manually kill/restart mypod-0 and mypod-1 to remove theInit: 0/1
state.While a fix is being worked on, the above workarounds have worked for us for 1.5 months. Look into anything that might be deleting/garbagecollecting the containers after the init containers have executed. Disable that if possible, and you’ll be unlikely to see this again.
@jad007 what do you mean “when the coreOS cluster kubernetes is running on updates itself”? Do you mean in-place upgrade Kubelet and docker binaries on the node? I noticed all following failure cases are reported on CoreOS. That might be the cause here.
As an update to my previous comment, the pods start up in the expected state, so they will be in the “Running” phase and “ready=true”. However, when the pod is restarted (for example when the coreOS cluster kubernetes is running on updates itself), the pods start up in the bad state with the “Pending” phase, but the container is running and ready.
We have also seen this problem with pods. in 1.3.9.
The pod has a init container. It looks like the init container actually completes successfully and later on the container starts and runs correctly ( with the output of the init container). However the pod state is stuck at pending.
I will follow up with more information and try to isolate the problem in a way that I can share in this thread.