kubernetes: Pod.Status.PodIP not updated during postStart lifecycle hook

What happened:

When using a postStart lifecycle hook, the Pod.Status.PodIP is not updated while the hook is running, even though the pod has been assigned an IP via the CNI plugin and networking is set up.

What you expected to happen:

The Pod.Status.PodIP should be set as soon as it’s available.

How to reproduce it (as minimally and precisely as possible):

The following manifest should repro the issue. Apply it and then watch the pod status.

apiVersion: v1
kind: Pod
metadata:
  name: test-calico-post-start
spec:
  containers:
  - name: test-calico
    image: byrnedo/alpine-curl:latest
    command:
    - sh
    - -c
    - "sleep 400000"
    lifecycle:
      postStart:
        exec:
          command: [ "/bin/sh","-c","sleep 120 && result=$(curl  http://www.google.com 1> /var/log/output.log 2> /var/log/output-error.log ); echo $result"]

The postStart lifecycle hook here contains a sleep for 120 seconds. During the first 120 seconds, the pod status will be ContainerCreating and the podIP will not be set. Only after the hook completes will the podIP be set.

Anything else we need to know?:

Why is this important? It prevents Calico network policy from operating correctly during the postStart hook since we cannot learn the pod’s IP address. (Note that when Calico is also the CNI plug-in, we do learn it, but Calico is designed to run in “policy only mode” on top of other CNI plug-ins, like the AWS VPC-CNI plug-in). C.f. https://github.com/projectcalico/libcalico-go/issues/1125 and https://github.com/projectcalico/felix/issues/2008 for users who are having problems here.

Environment:

  • Kubernetes version (use kubectl version): v1.16.3
  • Cloud provider or hardware configuration: AWS
  • OS (e.g: cat /etc/os-release): Ubuntu 16.04
  • Kernel (e.g. uname -a): 4.4.0-1098-aws
  • Install tools: kubeadm
  • Network plugin and version (if this is a network-related bug): amazon-vpc-cni-k8s v1.5 with Calico v3.8.1
  • Others:

About this issue

  • Original URL
  • State: open
  • Created 5 years ago
  • Reactions: 8
  • Comments: 65 (36 by maintainers)

Commits related to this issue

Most upvoted comments

In order to prevent init container start from hanging, we can put the call to post start hook (w.r.t. init container) in a goroutine whose execution time is bounded (2 minutes, e.g.) If the execution time exceeds the limit, return timeout error for init container start.

Does SyncPod() still block waiting for the hook to complete or timeout to fire? If so, then this doesn’t fully solve our problem, because we will still be blocked from getting the podIP while that hook is executing.

If we need to keep the init container hooks synchronous, then I would prefer the solution where we trigger a pod status update including the pod IP immediately following sandbox creation and prior to going into any blocking code starting containers.