kubernetes: Re-run initContainers in a Deployment when containers exit on error

Is this a BUG REPORT or FEATURE REQUEST?:

/kind feature

What happened: Container in a Deployment exits on error, container is restarted without first re-running the initContainer.

What you expected to happen: Container in a Deployment exits on error, initContainer is re-run before restarting the container.

How to reproduce it (as minimally and precisely as possible):

Sample spec:

kind: "Deployment"
apiVersion: "extensions/v1beta1"
metadata:
  name: "test"
  labels:
    name: "test"
spec:
  replicas: 1
  selector:
    matchLabels:
      name: "test"
  template:
    metadata:
      name: "test"
      labels:
        name: "test"
    spec:
      initContainers:
        - name: sleep
          image: debian:stretch
          imagePullPolicy: IfNotPresent
          command:
            - sleep
            - 1s
      containers:
        - name: test
          image: debian:stretch
          imagePullPolicy: IfNotPresent
          command:
            - /bin/sh
            - exit 1

Anything else we need to know?:

Environment:

Kubernetes version (use kubectl version):

Client Version: version.Info{Major:"", Minor:"", GitVersion:"v0.0.0-master+$Format:%h$", GitCommit:"db809c0eb7d33fac8f54d8735211f2f3a8fc4214", GitTreeState:"clean", BuildDate:"2017-09-11T19:46:47Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"", Minor:"", GitVersion:"v0.0.0-master+$Format:%h$", GitCommit:"db809c0eb7d33fac8f54d8735211f2f3a8fc4214", GitTreeState:"clean", BuildDate:"2017-09-11T19:46:47Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}

OS (e.g. from /etc/os-release): Debian GNU/Linux 9 (stretch)
Kernel (e.g. uname -a): Linux aleinung 4.9.0-3-amd64 #1 SMP Debian 4.9.30-2+deb9u3 (2017-08-06) x86_64 GNU/Linux

Implementation Context:

I have an initContainer that waits for a service running in Kubernetes to detect its existence via pod annotations, and send it an HTTP request, upon which it writes this value to disk. The main container then reads this value upon startup and “unwraps” it via another service, upon which it stores the unwrapped value in memory.

The value that is written to disk by the initContainer is a one-time read value, in that once it is used the value is then expired. The problem is that if the main container ever restarts due to fatal error, it loses that unwrapped value and upon startup tries to unwrap the expired value again, leading to an infinite crashing loop until I manually delete the pod, upon which a new pod is created, the initContainer runs, and all is again well.

I desire a feature that restarts the entire pod upon container error so that this workflow can function properly.

About this issue

Original URL
State: open
Created 7 years ago
Reactions: 20
Comments: 43 (14 by maintainers)

Most upvoted comments

@aisengard did you ever find a solution to this? We hit exactly the same issue today, we have an initContainer to read some secret data from vault and write it to an emptyDir volume which is shared between the initContainer and the first container in the pod. The first container reads this file when executing the command and then deletes it so no one can enter the pod and read the file; but if the container restarts the initContainer isn’t run so the file doesn’t exist

+29

REBELinBLUE on Oct 9, 2019

@majgis AFAIK, the only possible work-around is to bake in some co-ordination between the containers of the pod.

I am working on a PR for implementing the AlwaysPod restartPolicy which will address this problem of restarting pod on container failure. I am planning to raise the PR next week.

amshuman-kr on Jun 29, 2018

Is there any plans to fix this? 2023 and init containers still don’t reboot on rollout restart or on error restarts?

ryanovas on Sep 26, 2023

Good catch, When use init container to acquire certificate or token, containers may remove it after read to cache. Then containers may re-run repeatedly after once panic.

hzxuzhonghu on Sep 13, 2017

/reopen

this is such unexpected behaviour as a long time k8s user. the proposed solution to add another value to RestartPolicy which forces all containers to restart seems reasonable - any thoughts from sig-node?

afirth on Jul 21, 2022