calico: Calico readiness and liveliness probe fails

Seems like Calico is trying to start the worker node process on the same IPv4 address as the the one on the master node. Hence it is failing and erroring out. How to force the worker node process to use a different IPv4 address?

Kube version : 1.10.4

Describe pod

kubectl describe pods calico-node-9kftd -n kube-system

Namespace:      kube-system
Node:           worker1.k8s/192.168.99.7
Start Time:     Sun, 17 Jun 2018 18:49:10 +0530
Labels:         controller-revision-hash=1808776410
                k8s-app=calico-node
                pod-template-generation=1
Annotations:    scheduler.alpha.kubernetes.io/critical-pod=
Status:         Running
IP:             192.168.99.7
Controlled By:  DaemonSet/calico-node
Containers:
  calico-node:
    Container ID:   docker://2d88c0d7f10601aef1229e8c79023ce06743fbe5507b39d8b964e7d909ec78c9
    Image:          quay.io/calico/node:v3.1.3
    Image ID:       docker-pullable://quay.io/calico/node@sha256:a35541153f7695b38afada46843c64a2c546548cd8c171f402621736c6cf3f0b
    Port:           <none>
    Host Port:      <none>
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Mon, 18 Jun 2018 10:00:18 +0530
      Finished:     Mon, 18 Jun 2018 10:00:18 +0530
    Ready:          False
    Restart Count:  23
    Requests:
      cpu:      250m
    Liveness:   http-get http://:9099/liveness delay=10s timeout=1s period=10s #success=1 #failure=6
    Readiness:  http-get http://:9099/readiness delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:
      DATASTORE_TYPE:                     kubernetes
      FELIX_LOGSEVERITYSCREEN:            info
      CLUSTER_TYPE:                       k8s,bgp
      CALICO_DISABLE_FILE_LOGGING:        true
      FELIX_DEFAULTENDPOINTTOHOSTACTION:  ACCEPT
      FELIX_IPV6SUPPORT:                  false
      FELIX_IPINIPMTU:                    1440
      WAIT_FOR_DATASTORE:                 true
      CALICO_IPV4POOL_CIDR:               192.168.0.0/16
      CALICO_IPV4POOL_IPIP:               Always
      FELIX_IPINIPENABLED:                true
      FELIX_TYPHAK8SSERVICENAME:          <set to the key 'typha_service_name' of config map 'calico-config'>  Optional: false
      NODENAME:                            (v1:spec.nodeName)
      IP:                                 autodetect
      FELIX_HEALTHENABLED:                true
    Mounts:
      /lib/modules from lib-modules (ro)
      /var/lib/calico from var-lib-calico (rw)
      /var/run/calico from var-run-calico (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from calico-node-token-zggt6 (ro)
  install-cni:
    Container ID:  docker://76a0c72b569b99bcb4ad0c82a7b899c4034f258c907befee4dee5154fd6713f8
    Image:         quay.io/calico/cni:v3.1.3
    Image ID:      docker-pullable://quay.io/calico/cni@sha256:ed172c28bc193bb09bce6be6ed7dc6bfc85118d55e61d263cee8bbb0fd464a9d
    Port:          <none>
    Host Port:     <none>
    Command:
      /install-cni.sh
    State:          Running
      Started:      Mon, 18 Jun 2018 09:48:52 +0530
    Ready:          True
    Restart Count:  2
    Environment:
      CNI_CONF_NAME:         10-calico.conflist
      CNI_NETWORK_CONFIG:    <set to the key 'cni_network_config' of config map 'calico-config'>  Optional: false
      KUBERNETES_NODE_NAME:   (v1:spec.nodeName)
    Mounts:
      /host/etc/cni/net.d from cni-net-dir (rw)
      /host/opt/cni/bin from cni-bin-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from calico-node-token-zggt6 (ro)
Conditions:
  Type           Status
  Initialized    True 
  Ready          False 
  PodScheduled   True 
Volumes:
  lib-modules:
    Type:          HostPath (bare host directory volume)
    Path:          /lib/modules
    HostPathType:  
  var-run-calico:
    Type:          HostPath (bare host directory volume)
    Path:          /var/run/calico
    HostPathType:  
  var-lib-calico:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/calico
    HostPathType:  
  cni-bin-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /opt/cni/bin
    HostPathType:  
  cni-net-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/cni/net.d
    HostPathType:  
  calico-node-token-zggt6:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  calico-node-token-zggt6
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     :NoSchedule
                 :NoExecute
                 :NoSchedule
                 :NoExecute
                 CriticalAddonsOnly
                 node.kubernetes.io/disk-pressure:NoSchedule
                 node.kubernetes.io/memory-pressure:NoSchedule
                 node.kubernetes.io/not-ready:NoExecute
                 node.kubernetes.io/unreachable:NoExecute
Events:
  Type     Reason                 Age                  From                  Message
  ----     ------                 ----                 ----                  -------
  Normal   SuccessfulMountVolume  15h                  kubelet, worker1.k8s  MountVolume.SetUp succeeded for volume "cni-net-dir"
  Normal   SuccessfulMountVolume  15h                  kubelet, worker1.k8s  MountVolume.SetUp succeeded for volume "var-run-calico"
  Normal   SuccessfulMountVolume  15h                  kubelet, worker1.k8s  MountVolume.SetUp succeeded for volume "var-lib-calico"
  Normal   SuccessfulMountVolume  15h                  kubelet, worker1.k8s  MountVolume.SetUp succeeded for volume "lib-modules"
  Normal   SuccessfulMountVolume  15h                  kubelet, worker1.k8s  MountVolume.SetUp succeeded for volume "cni-bin-dir"
  Normal   SuccessfulMountVolume  15h                  kubelet, worker1.k8s  MountVolume.SetUp succeeded for volume "calico-node-token-zggt6"
  Warning  Failed                 15h                  kubelet, worker1.k8s  Failed to pull image "quay.io/calico/cni:v3.1.3": rpc error: code = Unknown desc = Error response from daemon: Get https://quay.io/v2/calico/cni/manifests/v3.1.3: Get https://quay.io/v2/auth?scope=repository%3Acalico%2Fcni%3Apull&service=quay.io: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
  Warning  Failed                 15h                  kubelet, worker1.k8s  Error: ErrImagePull
  Warning  Failed                 15h                  kubelet, worker1.k8s  Failed to pull image "quay.io/calico/node:v3.1.3": rpc error: code = Unknown desc = Error response from daemon: Get https://quay.io/v2/calico/node/manifests/v3.1.3: dial tcp 50.17.235.205:443: i/o timeout
  Normal   Pulling                15h (x2 over 15h)    kubelet, worker1.k8s  pulling image "quay.io/calico/cni:v3.1.3"
  Normal   Pulled                 15h                  kubelet, worker1.k8s  Successfully pulled image "quay.io/calico/cni:v3.1.3"
  Normal   Created                15h                  kubelet, worker1.k8s  Created container
  Normal   Started                15h                  kubelet, worker1.k8s  Started container
  Normal   Pulling                15h (x3 over 15h)    kubelet, worker1.k8s  pulling image "quay.io/calico/node:v3.1.3"
  Warning  Failed                 15h (x3 over 15h)    kubelet, worker1.k8s  Error: ErrImagePull
  Warning  Failed                 15h (x2 over 15h)    kubelet, worker1.k8s  Failed to pull image "quay.io/calico/node:v3.1.3": rpc error: code = Unknown desc = Error response from daemon: Get https://quay.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
  Warning  Failed                 15h (x2 over 15h)    kubelet, worker1.k8s  Error: ImagePullBackOff
  Normal   BackOff                15h (x16 over 15h)   kubelet, worker1.k8s  Back-off pulling image "quay.io/calico/node:v3.1.3"
  Normal   Pulled                 14h (x12 over 14h)   kubelet, worker1.k8s  Container image "quay.io/calico/node:v3.1.3" already present on machine
  Warning  BackOff                14h (x121 over 14h)  kubelet, worker1.k8s  Back-off restarting failed container
  Normal   SuccessfulMountVolume  3h                   kubelet, worker1.k8s  MountVolume.SetUp succeeded for volume "var-lib-calico"
  Normal   SuccessfulMountVolume  3h                   kubelet, worker1.k8s  MountVolume.SetUp succeeded for volume "var-run-calico"
  Normal   SuccessfulMountVolume  3h                   kubelet, worker1.k8s  MountVolume.SetUp succeeded for volume "cni-bin-dir"
  Normal   SuccessfulMountVolume  3h                   kubelet, worker1.k8s  MountVolume.SetUp succeeded for volume "cni-net-dir"
  Normal   SuccessfulMountVolume  3h                   kubelet, worker1.k8s  MountVolume.SetUp succeeded for volume "lib-modules"
  Normal   SuccessfulMountVolume  3h                   kubelet, worker1.k8s  MountVolume.SetUp succeeded for volume "calico-node-token-zggt6"
  Normal   SandboxChanged         3h                   kubelet, worker1.k8s  Pod sandbox changed, it will be killed and re-created.
  Normal   Pulled                 3h                   kubelet, worker1.k8s  Container image "quay.io/calico/cni:v3.1.3" already present on machine
  Normal   Created                3h                   kubelet, worker1.k8s  Created container
  Normal   Started                3h                   kubelet, worker1.k8s  Started container
  Warning  Unhealthy              3h (x2 over 3h)      kubelet, worker1.k8s  Readiness probe failed: Get http://192.168.99.7:9099/readiness: dial tcp 192.168.99.7:9099: getsockopt: connection refused
  Warning  Unhealthy              3h (x2 over 3h)      kubelet, worker1.k8s  Liveness probe failed: Get http://192.168.99.7:9099/liveness: dial tcp 192.168.99.7:9099: getsockopt: connection refused
  Normal   Started                3h (x2 over 3h)      kubelet, worker1.k8s  Started container
  Normal   Pulled                 3h (x2 over 3h)      kubelet, worker1.k8s  Container image "quay.io/calico/node:v3.1.3" already present on machine
  Normal   Created                3h (x2 over 3h)      kubelet, worker1.k8s  Created container
  Warning  BackOff                3h (x47 over 3h)     kubelet, worker1.k8s  Back-off restarting failed container
  Normal   SuccessfulMountVolume  12m                  kubelet, worker1.k8s  MountVolume.SetUp succeeded for volume "cni-net-dir"
  Normal   SuccessfulMountVolume  12m                  kubelet, worker1.k8s  MountVolume.SetUp succeeded for volume "var-lib-calico"
  Normal   SuccessfulMountVolume  12m                  kubelet, worker1.k8s  MountVolume.SetUp succeeded for volume "cni-bin-dir"
  Normal   SuccessfulMountVolume  12m                  kubelet, worker1.k8s  MountVolume.SetUp succeeded for volume "var-run-calico"
  Normal   SuccessfulMountVolume  12m                  kubelet, worker1.k8s  MountVolume.SetUp succeeded for volume "lib-modules"
  Normal   SuccessfulMountVolume  12m                  kubelet, worker1.k8s  MountVolume.SetUp succeeded for volume "calico-node-token-zggt6"
  Normal   SandboxChanged         12m                  kubelet, worker1.k8s  Pod sandbox changed, it will be killed and re-created.
  Normal   Pulled                 12m                  kubelet, worker1.k8s  Container image "quay.io/calico/cni:v3.1.3" already present on machine
  Normal   Created                12m                  kubelet, worker1.k8s  Created container
  Normal   Started                12m                  kubelet, worker1.k8s  Started container
##  Warning  Unhealthy              12m (x2 over 12m)    kubelet, worker1.k8s  Liveness probe failed: Get http://192.168.99.7:9099/liveness: dial tcp 192.168.99.7:9099: getsockopt: connection refused
## Warning  Unhealthy              11m (x3 over 12m)    kubelet, worker1.k8s  Readiness probe failed: Get http://192.168.99.7:9099/readiness: dial tcp 192.168.99.7:9099: getsockopt: connection refused
  Normal   Started                11m (x2 over 12m)    kubelet, worker1.k8s  Started container
  Normal   Created                11m (x2 over 12m)    kubelet, worker1.k8s  Created container
  Normal   Pulled                 11m (x2 over 12m)    kubelet, worker1.k8s  Container image "quay.io/calico/node:v3.1.3" already present on machine
  Warning  BackOff                2m (x47 over 11m)    kubelet, worker1.k8s  Back-off restarting failed container

Container Log:

kubectl logs calico-node-9kftd -n kube-system -c calico-node

2018-06-18 04:45:36.720 [INFO][9] startup.go 267: Using NODENAME environment for node name
2018-06-18 04:45:36.720 [INFO][9] startup.go 279: Determined node name: worker1.k8s
2018-06-18 04:45:36.724 [INFO][9] startup.go 302: Checking datastore connection
2018-06-18 04:45:36.754 [INFO][9] startup.go 326: Datastore connection verified
2018-06-18 04:45:36.755 [INFO][9] startup.go 99: Datastore is ready
2018-06-18 04:45:36.783 [INFO][9] startup.go 564: Using autodetected IPv4 address on interface enp0s8: 10.0.3.15/24
2018-06-18 04:45:36.783 [INFO][9] startup.go 432: Node IPv4 changed, will check for conflicts
2018-06-18 04:45:36.798 [WARNING][9] startup.go 861: Calico node 'master' is already using the IPv4 address 10.0.3.15.
2018-06-18 04:45:36.798 [INFO][9] startup.go 205: Clearing out-of-date IPv4 address from this node IP="10.0.3.15/24"
2018-06-18 04:45:36.826 [WARNING][9] startup.go 1058: Terminating

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 23 (5 by maintainers)

Most upvoted comments

had the exact same issue. @tmjd thanks for the hint. you need to set auto-detect to use another method suitable for your network. E.g. adding following to calico yaml:

- name: IP_AUTODETECTION_METHOD
              value: "interface=eth.*"

worked for me.

For people like me, that have just started to explore the world of k8s with a bunch of virtual boxes, and actually just want to see something like 2 nginx pods running on 2 different nodes, the network setup has turned out to be a real nightmare. Coming from docker swarm, everything was easy. Now, I see myself digging into iptables and yaml files, that are interconnected and need to be tweaked in a very special way. Don’t get me wrong - I am willing to learn whatever is necessary to manage my 3 nodes, but I am also frustrated to be sidetracked by the network, which just needs to know 2 things: where is the cluster, and which addresses can I use for my components.

@madmesi you can give multiple interface names like,

            - name: IP_AUTODETECTION_METHOD
              value: "interface=enp8s0,ens192"

我遇到了同样的问题。(I had the same issue.) 以下是我的环境(Here is my environment): 操作系统版本(OS version):Ubuntu Server 20.04 LTS, K8S版本(Kunernetes version):1.20.4, Calico 版本(Calico version):https://docs.projectcalico.org/v3.11/manifests/calico.yaml

解决方法(Solutions):

cat /etc/sysctl.d/99-kubernetes-cri.conf 
net.bridge.bridge-nf-call-iptables  = 1
net.ipv4.ip_forward                 = 1
net.bridge.bridge-nf-call-ip6tables = 1
##The following two settings solve the issue.I tried to comment it out and the issue reappeared.
net.ipv4.conf.default.rp_filter=1
net.ipv4.conf.all.rp_filter=1

sudo sysctl --system

worked for me.

had the exact same issue. @tmjd thanks for the hint. you need to set auto-detect to use another method suitable for your network. E.g. adding following to calico yaml:

- name: IP_AUTODETECTION_METHOD
              value: "interface=eth.*"

worked for me.

Hey everyone, thanks indeed for your help, cause I was hopeless. I followed the same workaround, but in my case, I have 3 different nodes, which two of them have the interface name of “eth0”, but my worker node’s interface name is “enp3s0f0”. in the calico.yaml file, I added the line

            - name: IP_AUTODETECTION_METHOD
              value: "can-reach=8.8.8.8"

but still no result. I tried the regex as mentioned here like following:

            - name: IP_AUTODETECTION_METHOD
              value: "interface=e*"

calico version : 3.9

kubectl version
Client Version:version.Info{
   Major:"1",
   Minor:"17",
   GitVersion:"v1.17.2",
   GitTreeState:"clean",
   GoVersion:"go1.13.5",
   Compiler:"gc",
   Platform:"linux/amd64"
}Server Version:version.Info{
   Major:"1",
   Minor:"17",
   GitVersion:"v1.17.2",
   GitTreeState:"clean",
   GoVersion:"go1.13.5",
   Compiler:"gc",
   Platform:"linux/amd64"
}

in my two machines, one enp0s8 is with 192.168.56.110 and the other one with 192.168.56.117, which is the ones I would like for calico-nodes. I have read the link again and again just don’t now how to.

  1. I first tried to modified calico.yaml, by adding IP_AUTODETECTION_METHOD under IP, wish I am correct, # Auto-detect the BGP IP address. - name: IP value: “autodetect” - name: IP_AUTODETECTION_METHOD value: “interface=enp0s8”
  2. run shell command of ‘export IP_AUTODETECTION_METHOD=interface=enp0s8’

with either one above then followed by below, kubectl delete -f calico.yaml kubectl apply -f calico.yaml

I am still seeing the calico-node-xxxx picked up IP address from enp0s8 [root@centos7b2 ~]# kubectl describe pods -n kube-system calico-node-mkv8t Name: calico-node-mkv8t Namespace: kube-system Priority: 0 PriorityClassName: <none> Node: centos7g2/10.0.2.15 Start Time: Mon, 24 Sep 2018 13:43:15 -0700 Labels: controller-revision-hash=1427857993 k8s-app=calico-node pod-template-generation=1 Annotations: scheduler.alpha.kubernetes.io/critical-pod= Status: Running IP: 10.0.2.15 Controlled By: DaemonSet/calico-node