istio: The injected sidecar' pod always happen Init:CrashLoopBackOff in the second day!

Bug description

I install istio1.4.2. And inject sidecar to default namespace.

kubectl label namespace default istio-injection=enabled

It works perfectly in the first day. But i found our all services of the default namespace became Init:CrashLoopBackOff at 10:30am in the second day.

All the container is Ready status, But the init-containers(istio-init) are always restart.

[ ] Configuration Infrastructure [ ] Docs [ ] Installation [x] Networking [ ] Performance and Scalability [ ] Policies and Telemetry [ ] Security [ ] Test and Release [ ] User Experience [ ] Developer Infrastructure

Expected behavior

I expect init-containers only start once in the pod lifecycle. Init-containers(istio-init) not restart by itself.

Steps to reproduce the bug

Name:         http-server-fd797d47-rgk49
Namespace:    default
Priority:     0
Node:         ni-k8s-node1/10.60.150.238
Start Time:   Thu, 19 Dec 2019 11:48:40 +0800
Labels:       app=http-server
              pod-template-hash=fd797d47
              security.istio.io/tlsMode=istio
Annotations:  sidecar.istio.io/status:
                {"version":"8d80e9685defcc00b0d8c9274b60071ba8810537e0ed310ea96c1de0785272c7","initContainers":["istio-init"],"containers":["istio-proxy"]...
Status:       Running
IP:           10.244.6.64
IPs:
  IP:           10.244.6.64
Controlled By:  ReplicaSet/http-server-fd797d47
Init Containers:
  istio-init:
    Container ID:  docker://78928bc5ef52ee9e3884e7208ebe9113c874014de76adb9169810eaea2acc2e8
    Image:         docker.io/istio/proxyv2:1.4.2
    Image ID:      docker-pullable://istio/proxyv2@sha256:c98b4d277b724a2ad0c6bf22008bd02ddeb2525f19e35cdefe8a3181313716e7
    Port:          <none>
    Host Port:     <none>
    Command:
      istio-iptables
      -p
      15001
      -z
      15006
      -u
      1337
      -m
      REDIRECT
      -i
      *
      -x
      
      -b
      *
      -d
      15020
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Fri, 20 Dec 2019 16:24:46 +0800
      Finished:     Fri, 20 Dec 2019 16:24:46 +0800
    Ready:          False
    Restart Count:  74
    Limits:
      cpu:     100m
      memory:  50Mi
    Requests:
      cpu:        10m
      memory:     10Mi
    Environment:  <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-p9skx (ro)
Containers:
  http-server:
    Container ID:   docker://8260f1208351e2d55f051ada95159ea5589b32e5f7ed5eb9dbc077b525bcf0ee
    Image:          docker.navicore.cn/test/http_server:latest
    Image ID:       docker-pullable://docker.navicore.cn/test/http_server@sha256:fa4824b9124f62a1545c4747091b045338ed41cdb28d72abc159c3337a37d091
    Port:           <none>
    Host Port:      <none>
    State:          Running
      Started:      Thu, 19 Dec 2019 11:48:43 +0800
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     50m
      memory:  100Mi
    Requests:
      cpu:        50m
      memory:     100Mi
    Environment:  <none>
    Mounts:
      /tmp from zhaogq (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-p9skx (ro)
  istio-proxy:
    Container ID:  docker://56060c5986f9c4e81930c85c73880f420586900f028b8ab54049f1e6f50ef2e2
    Image:         docker.io/istio/proxyv2:1.4.2
    Image ID:      docker-pullable://istio/proxyv2@sha256:c98b4d277b724a2ad0c6bf22008bd02ddeb2525f19e35cdefe8a3181313716e7
    Port:          15090/TCP
    Host Port:     0/TCP
    Args:
      proxy
      sidecar
      --domain
      $(POD_NAMESPACE).svc.cluster.local
      --configPath
      /etc/istio/proxy
      --binaryPath
      /usr/local/bin/envoy
      --serviceCluster
      http-server.$(POD_NAMESPACE)
      --drainDuration
      45s
      --parentShutdownDuration
      1m0s
      --discoveryAddress
      istio-pilot.istio-system:15010
      --zipkinAddress
      zipkin.istio-system:9411
      --proxyLogLevel=warning
      --proxyComponentLogLevel=misc:error
      --connectTimeout
      10s
      --proxyAdminPort
      15000
      --concurrency
      2
      --controlPlaneAuthPolicy
      NONE
      --dnsRefreshRate
      300s
      --statusPort
      15020
      --applicationPorts
      
      --trust-domain=cluster.local
    State:          Running
      Started:      Thu, 19 Dec 2019 11:48:44 +0800
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     2
      memory:  1Gi
    Requests:
      cpu:      10m
      memory:   40Mi
    Readiness:  http-get http://:15020/healthz/ready delay=1s timeout=1s period=2s #success=1 #failure=30
    Environment:
      POD_NAME:                          http-server-fd797d47-rgk49 (v1:metadata.name)
      POD_NAMESPACE:                     default (v1:metadata.namespace)
      INSTANCE_IP:                        (v1:status.podIP)
      SERVICE_ACCOUNT:                    (v1:spec.serviceAccountName)
      HOST_IP:                            (v1:status.hostIP)
      ISTIO_META_POD_PORTS:              [
                                         ]
      ISTIO_META_CLUSTER_ID:             Kubernetes
      ISTIO_META_POD_NAME:               http-server-fd797d47-rgk49 (v1:metadata.name)
      ISTIO_META_CONFIG_NAMESPACE:       default (v1:metadata.namespace)
      SDS_ENABLED:                       false
      ISTIO_META_INTERCEPTION_MODE:      REDIRECT
      ISTIO_META_INCLUDE_INBOUND_PORTS:  
      ISTIO_METAJSON_LABELS:             {"app":"http-server","pod-template-hash":"fd797d47"}
                                         
      ISTIO_META_WORKLOAD_NAME:          http-server
      ISTIO_META_OWNER:                  kubernetes://api/apps/v1/namespaces/default/deployments/http-server
      ISTIO_META_MESH_ID:                cluster.local
    Mounts:
      /etc/certs/ from istio-certs (ro)
      /etc/istio/proxy from istio-envoy (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-p9skx (ro)
Conditions:
  Type              Status
  Initialized       False 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  zhaogq:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  zhaogq-pvc
    ReadOnly:   false
  default-token-p9skx:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-p9skx
    Optional:    false
  istio-envoy:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     Memory
    SizeLimit:  <unset>
  istio-certs:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  istio.default
    Optional:    true
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s

Version (include the output of istioctl version --remote and kubectl version and helm version if you used Helm)

./bin/istioctl version --remote

client version: 1.4.2
control plane version: 1.4.2
data plane version: 1.4.2 (26 proxies)
kubectl version
Client Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.3", GitCommit:"b3cbbae08ec52a7fc73d334838e18d17e8512749", GitTreeState:"clean", BuildDate:"2019-11-13T11:23:11Z", GoVersion:"go1.12.12", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.3", GitCommit:"b3cbbae08ec52a7fc73d334838e18d17e8512749", GitTreeState:"clean", BuildDate:"2019-11-13T11:13:49Z", GoVersion:"go1.12.12", Compiler:"gc", Platform:"linux/amd64"}

How was Istio installed?

./bin/istioctl manifest generate --set profile=demo --set values.global.proxy.privileged=true | kubectl apply -f -

Environment where bug was observed (cloud vendor, OS, etc)

cat /proc/version
Linux version 4.4.0-87-generic (buildd@lcy01-31) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4) ) #110-Ubuntu SMP Tue Jul 18 12:55:35 UTC 2017

kubelet log in node

Dec 19 10:30:16 ni-k8s-node2 kubelet[116707]: W1219 10:30:16.022592  116707 docker_container.go:224] **Deleted previously existing symlink file: "/var/log/pods/default_te-tile-server-5664d48dcf-ztbfz_022beff9-7d1d-4f9b-bb94-5729b89a4a8d/istio-init/0.log**"
Dec 19 10:30:16 ni-k8s-node2 kubelet[116707]: E1219 10:30:16.255400  116707 file_linux.go:60] Unable to read config path "/etc/kubernetes/manifests": path does not exist, ignoring
Dec 19 10:30:17 ni-k8s-node2 kubelet[116707]: E1219 10:30:17.255634  116707 file_linux.go:60] Unable to read config path "/etc/kubernetes/manifests": path does not exist, ignoring
Dec 19 10:30:17 ni-k8s-node2 kubelet[116707]: E1219 10:30:17.795655  116707 remote_runtime.go:261] RemoveContainer "5e9f5be64f69e9ec7ac079443dbfe22272b75726917ae59e64f3a988c7421f16" from runtime service failed: rpc error: code = Unknown desc = failed to remove container "5e9f5be64f69e9ec7ac079443dbfe22272b75726917ae59e64f3a988c7421f16": Error response from daemon: removal of container 5e9f5be64f69e9ec7ac079443dbfe22272b75726917ae59e64f3a988c7421f16 is already in progress 
Dec 19 10:30:17 ni-k8s-node2 kubelet[116707]: E1219 10:30:17.801681  116707 pod_workers.go:191] Error syncing pod 022beff9-7d1d-4f9b-bb94-5729b89a4a8d ("te-tile-server-5664d48dcf-ztbfz_default(022beff9-7d1d-4f9b-bb94-5729b89a4a8d)"), **skipping: failed to "StartContainer" for "istio-init" with CrashLoopBackOff: "back-off 10s restarting failed container=istio-init pod=te-tile-server-5664d48dcf-ztbfz_default**(022beff9-7d1d-4f9b-bb94-5729b89a4a8d)"

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Reactions: 1
  • Comments: 18 (8 by maintainers)

Most upvoted comments

We need to use a filter to exclude istio-init container, in case you are using docker system prune command: docker system prune -af --volumes –filter “label!=io.kubernetes.container.name=istio-init”

@zhaoguangqiang not sure if it helps your case, but I had something similar with isito 1.4.2. Basically we had a cron job running outside of the cluster that would clean up docker images (docker prune … ), as soon as we disabled it istio-init stopped restarting. do you have some sort of clean up jobs running against your nodes?

there is also this one: https://github.com/kubernetes/kubernetes/issues/67261, but it was closed with no resolution. Basically the idea is to trust kubelet to do the right thing.