kubernetes: Containers from init-containers section do not execute during node reboots
BUG REPORT Kubernetes version v1.4.3+coreos.0
Environment:
- Cloud provider or hardware configuration: Bare metal CoreOS installation. 3 management nodes + 2 worker nodes.
- OS (e.g. from /etc/os-release):
CoreOS stable (1185.3.0) - Kernel
4.7.3-coreos-r2
What happened:
I added an init-container to the kube-scheduler.yaml file in /etc/kubernetes/manifests directory. It was executed once after the yaml file modification. When I rebooted the node running this POD, the POD was restarted but the init-container did not execute:
The kube-scheduler.yaml file:
apiVersion: v1
kind: Pod
metadata:
name: kube-scheduler
namespace: kube-system
annotations:
pod.beta.kubernetes.io/init-containers: '[
{
"image": "quay.io/coreos/hyperkube:v1.4.3_coreos.0",
"name": "wait-for-master",
"command": [ "/bin/bash", "-c", "while ! timeout 1 bash -c \"/kubectl --server=http://127.0.0.1:8080/ cluster-info\"; do sleep 1; done" ]
}
]'
spec:
hostNetwork: true
containers:
- name: kube-scheduler
image: quay.io/coreos/hyperkube:v1.4.3_coreos.0
command:
- /hyperkube
- scheduler
- --master=http://127.0.0.1:8080
- --leader-elect=true
livenessProbe:
httpGet:
host: 127.0.0.1
path: /healthz
port: 10251
initialDelaySeconds: 15
timeoutSeconds: 1
The POD description after the node reboot:
Name: kube-scheduler-10.54.147.6
Namespace: kube-system
Node: 10.54.147.6/10.54.147.6
Start Time: Wed, 09 Nov 2016 07:51:35 +0100
Labels: <none>
Status: Running
IP: 10.54.147.6
Controllers: <none>
Init Containers:
wait-for-master:
Container ID: docker://ff5ef0e759eb332c50fcc117a5758c41080835addbd76e33eb60cdde10d9587b
Image: quay.io/coreos/hyperkube:v1.4.3_coreos.0
Image ID: docker://sha256:5d3bc50f81574bc41952e13862b04b98f310ab60601ff990e7267a4ba5227e8b
Port:
Command:
/bin/bash
-c
while ! timeout 1 bash -c "/kubectl --server=http://127.0.0.1:8080/ cluster-info"; do sleep 1; done
State: Terminated
Reason: Completed
Exit Code: 0
Started: Tue, 08 Nov 2016 14:17:27 +0100
Finished: Tue, 08 Nov 2016 14:17:35 +0100
Ready: True
Restart Count: 0
Volume Mounts: <none>
Environment Variables: <none>
Containers:
kube-scheduler:
Container ID: docker://03e40bb84d477e6a284c4bfdf677d15135fe78785b17ae62d09cd04063f81aec
Image: quay.io/coreos/hyperkube:v1.4.3_coreos.0
Image ID: docker://sha256:5d3bc50f81574bc41952e13862b04b98f310ab60601ff990e7267a4ba5227e8b
Port:
Command:
/hyperkube
scheduler
--master=http://127.0.0.1:8080
--leader-elect=true
State: Running
Started: Wed, 09 Nov 2016 07:51:36 +0100
Last State: Terminated
Reason: Error
Exit Code: 2
Started: Tue, 08 Nov 2016 20:50:42 +0100
Finished: Wed, 09 Nov 2016 07:48:44 +0100
Ready: True
Restart Count: 4
Liveness: http-get http://127.0.0.1:10251/healthz delay=15s timeout=1s period=10s #success=1 #failure=3
Volume Mounts: <none>
Environment Variables: <none>
Conditions:
Type Status
Initialized True
Ready True
PodScheduled True
No volumes.
QoS Class: BestEffort
Tolerations: <none>
How to reproduce it
Create a pod with the cloud-init section and pin it to one node. Reboot the node several times and check if the cloud-init container was executed.
About this issue
- Original URL
- State: closed
- Created 8 years ago
- Comments: 21 (16 by maintainers)
Commits related to this issue
- Merge pull request #47599 from yujuhong/restart-init Automatic merge from submit-queue (batch tested with PRs 46317, 48922, 50651, 50230, 47599) Rerun init containers when the pod needs to be restar... — committed to kubernetes/kubernetes by deleted user 7 years ago
- Merge pull request #53157 from MrHohn/revert-kubelet-touch-lock Automatic merge from submit-queue (batch tested with PRs 53157, 52628). If you want to cherry-pick this change to another branch, pleas... — committed to kubernetes/kubernetes by deleted user 7 years ago
Any node restart should re-run init containers, so we should fix that if it no longer happens. Same for infracontainer recreation.
I have the same issue, when a node reboots because of a CoreOS update, my elasticsearch init containers, which set vm.max_map_count=262144 via init container, are down, because the values wasn’t set during reboot. When I kick the pods, the container comes up.
What are the chances of this being classed as a bug and backported to supported Kubernetes versions? This is quite a problematic behaviour that will require me to either:
a) build my health check to check for the existence of a particular file/some state, and trigger a pod recreation in the event it’s not present (thus also preventing me allowing the pod to ever restart, and causing it to only ever be recreated), or
b) promoting my init container to a sidecar that sleeps (and making my main process wait for these sidecars to finish their processing, potentially through a tmpfs/in memory emptyDir)
Oops… I fixed that. Thanks.
I am not sure about backporting such a huge change and risking the possibility of destabilizing the previous releases. For 1.7, this may still be viable, though not very desirable. For 1.6, there were two different implementations (cri and pre-cri) and quite a lot of change has happened since then.
Init containers have been there for many releases and I don’t think this feature has ever worked as intended. Even though this is a bug, it’s not technically a regression. I would definitely not recommend patching 1.6, and leaning towards not patching 1.7 too.
@smarterclayton @dchen1107 thoughts?