calico: Calico readiness and liveliness probe fails
Seems like Calico is trying to start the worker node process on the same IPv4 address as the the one on the master node. Hence it is failing and erroring out. How to force the worker node process to use a different IPv4 address?
Kube version : 1.10.4
Describe pod
kubectl describe pods calico-node-9kftd -n kube-system
Namespace: kube-system
Node: worker1.k8s/192.168.99.7
Start Time: Sun, 17 Jun 2018 18:49:10 +0530
Labels: controller-revision-hash=1808776410
k8s-app=calico-node
pod-template-generation=1
Annotations: scheduler.alpha.kubernetes.io/critical-pod=
Status: Running
IP: 192.168.99.7
Controlled By: DaemonSet/calico-node
Containers:
calico-node:
Container ID: docker://2d88c0d7f10601aef1229e8c79023ce06743fbe5507b39d8b964e7d909ec78c9
Image: quay.io/calico/node:v3.1.3
Image ID: docker-pullable://quay.io/calico/node@sha256:a35541153f7695b38afada46843c64a2c546548cd8c171f402621736c6cf3f0b
Port: <none>
Host Port: <none>
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Mon, 18 Jun 2018 10:00:18 +0530
Finished: Mon, 18 Jun 2018 10:00:18 +0530
Ready: False
Restart Count: 23
Requests:
cpu: 250m
Liveness: http-get http://:9099/liveness delay=10s timeout=1s period=10s #success=1 #failure=6
Readiness: http-get http://:9099/readiness delay=0s timeout=1s period=10s #success=1 #failure=3
Environment:
DATASTORE_TYPE: kubernetes
FELIX_LOGSEVERITYSCREEN: info
CLUSTER_TYPE: k8s,bgp
CALICO_DISABLE_FILE_LOGGING: true
FELIX_DEFAULTENDPOINTTOHOSTACTION: ACCEPT
FELIX_IPV6SUPPORT: false
FELIX_IPINIPMTU: 1440
WAIT_FOR_DATASTORE: true
CALICO_IPV4POOL_CIDR: 192.168.0.0/16
CALICO_IPV4POOL_IPIP: Always
FELIX_IPINIPENABLED: true
FELIX_TYPHAK8SSERVICENAME: <set to the key 'typha_service_name' of config map 'calico-config'> Optional: false
NODENAME: (v1:spec.nodeName)
IP: autodetect
FELIX_HEALTHENABLED: true
Mounts:
/lib/modules from lib-modules (ro)
/var/lib/calico from var-lib-calico (rw)
/var/run/calico from var-run-calico (rw)
/var/run/secrets/kubernetes.io/serviceaccount from calico-node-token-zggt6 (ro)
install-cni:
Container ID: docker://76a0c72b569b99bcb4ad0c82a7b899c4034f258c907befee4dee5154fd6713f8
Image: quay.io/calico/cni:v3.1.3
Image ID: docker-pullable://quay.io/calico/cni@sha256:ed172c28bc193bb09bce6be6ed7dc6bfc85118d55e61d263cee8bbb0fd464a9d
Port: <none>
Host Port: <none>
Command:
/install-cni.sh
State: Running
Started: Mon, 18 Jun 2018 09:48:52 +0530
Ready: True
Restart Count: 2
Environment:
CNI_CONF_NAME: 10-calico.conflist
CNI_NETWORK_CONFIG: <set to the key 'cni_network_config' of config map 'calico-config'> Optional: false
KUBERNETES_NODE_NAME: (v1:spec.nodeName)
Mounts:
/host/etc/cni/net.d from cni-net-dir (rw)
/host/opt/cni/bin from cni-bin-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from calico-node-token-zggt6 (ro)
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
lib-modules:
Type: HostPath (bare host directory volume)
Path: /lib/modules
HostPathType:
var-run-calico:
Type: HostPath (bare host directory volume)
Path: /var/run/calico
HostPathType:
var-lib-calico:
Type: HostPath (bare host directory volume)
Path: /var/lib/calico
HostPathType:
cni-bin-dir:
Type: HostPath (bare host directory volume)
Path: /opt/cni/bin
HostPathType:
cni-net-dir:
Type: HostPath (bare host directory volume)
Path: /etc/cni/net.d
HostPathType:
calico-node-token-zggt6:
Type: Secret (a volume populated by a Secret)
SecretName: calico-node-token-zggt6
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: :NoSchedule
:NoExecute
:NoSchedule
:NoExecute
CriticalAddonsOnly
node.kubernetes.io/disk-pressure:NoSchedule
node.kubernetes.io/memory-pressure:NoSchedule
node.kubernetes.io/not-ready:NoExecute
node.kubernetes.io/unreachable:NoExecute
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulMountVolume 15h kubelet, worker1.k8s MountVolume.SetUp succeeded for volume "cni-net-dir"
Normal SuccessfulMountVolume 15h kubelet, worker1.k8s MountVolume.SetUp succeeded for volume "var-run-calico"
Normal SuccessfulMountVolume 15h kubelet, worker1.k8s MountVolume.SetUp succeeded for volume "var-lib-calico"
Normal SuccessfulMountVolume 15h kubelet, worker1.k8s MountVolume.SetUp succeeded for volume "lib-modules"
Normal SuccessfulMountVolume 15h kubelet, worker1.k8s MountVolume.SetUp succeeded for volume "cni-bin-dir"
Normal SuccessfulMountVolume 15h kubelet, worker1.k8s MountVolume.SetUp succeeded for volume "calico-node-token-zggt6"
Warning Failed 15h kubelet, worker1.k8s Failed to pull image "quay.io/calico/cni:v3.1.3": rpc error: code = Unknown desc = Error response from daemon: Get https://quay.io/v2/calico/cni/manifests/v3.1.3: Get https://quay.io/v2/auth?scope=repository%3Acalico%2Fcni%3Apull&service=quay.io: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
Warning Failed 15h kubelet, worker1.k8s Error: ErrImagePull
Warning Failed 15h kubelet, worker1.k8s Failed to pull image "quay.io/calico/node:v3.1.3": rpc error: code = Unknown desc = Error response from daemon: Get https://quay.io/v2/calico/node/manifests/v3.1.3: dial tcp 50.17.235.205:443: i/o timeout
Normal Pulling 15h (x2 over 15h) kubelet, worker1.k8s pulling image "quay.io/calico/cni:v3.1.3"
Normal Pulled 15h kubelet, worker1.k8s Successfully pulled image "quay.io/calico/cni:v3.1.3"
Normal Created 15h kubelet, worker1.k8s Created container
Normal Started 15h kubelet, worker1.k8s Started container
Normal Pulling 15h (x3 over 15h) kubelet, worker1.k8s pulling image "quay.io/calico/node:v3.1.3"
Warning Failed 15h (x3 over 15h) kubelet, worker1.k8s Error: ErrImagePull
Warning Failed 15h (x2 over 15h) kubelet, worker1.k8s Failed to pull image "quay.io/calico/node:v3.1.3": rpc error: code = Unknown desc = Error response from daemon: Get https://quay.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
Warning Failed 15h (x2 over 15h) kubelet, worker1.k8s Error: ImagePullBackOff
Normal BackOff 15h (x16 over 15h) kubelet, worker1.k8s Back-off pulling image "quay.io/calico/node:v3.1.3"
Normal Pulled 14h (x12 over 14h) kubelet, worker1.k8s Container image "quay.io/calico/node:v3.1.3" already present on machine
Warning BackOff 14h (x121 over 14h) kubelet, worker1.k8s Back-off restarting failed container
Normal SuccessfulMountVolume 3h kubelet, worker1.k8s MountVolume.SetUp succeeded for volume "var-lib-calico"
Normal SuccessfulMountVolume 3h kubelet, worker1.k8s MountVolume.SetUp succeeded for volume "var-run-calico"
Normal SuccessfulMountVolume 3h kubelet, worker1.k8s MountVolume.SetUp succeeded for volume "cni-bin-dir"
Normal SuccessfulMountVolume 3h kubelet, worker1.k8s MountVolume.SetUp succeeded for volume "cni-net-dir"
Normal SuccessfulMountVolume 3h kubelet, worker1.k8s MountVolume.SetUp succeeded for volume "lib-modules"
Normal SuccessfulMountVolume 3h kubelet, worker1.k8s MountVolume.SetUp succeeded for volume "calico-node-token-zggt6"
Normal SandboxChanged 3h kubelet, worker1.k8s Pod sandbox changed, it will be killed and re-created.
Normal Pulled 3h kubelet, worker1.k8s Container image "quay.io/calico/cni:v3.1.3" already present on machine
Normal Created 3h kubelet, worker1.k8s Created container
Normal Started 3h kubelet, worker1.k8s Started container
Warning Unhealthy 3h (x2 over 3h) kubelet, worker1.k8s Readiness probe failed: Get http://192.168.99.7:9099/readiness: dial tcp 192.168.99.7:9099: getsockopt: connection refused
Warning Unhealthy 3h (x2 over 3h) kubelet, worker1.k8s Liveness probe failed: Get http://192.168.99.7:9099/liveness: dial tcp 192.168.99.7:9099: getsockopt: connection refused
Normal Started 3h (x2 over 3h) kubelet, worker1.k8s Started container
Normal Pulled 3h (x2 over 3h) kubelet, worker1.k8s Container image "quay.io/calico/node:v3.1.3" already present on machine
Normal Created 3h (x2 over 3h) kubelet, worker1.k8s Created container
Warning BackOff 3h (x47 over 3h) kubelet, worker1.k8s Back-off restarting failed container
Normal SuccessfulMountVolume 12m kubelet, worker1.k8s MountVolume.SetUp succeeded for volume "cni-net-dir"
Normal SuccessfulMountVolume 12m kubelet, worker1.k8s MountVolume.SetUp succeeded for volume "var-lib-calico"
Normal SuccessfulMountVolume 12m kubelet, worker1.k8s MountVolume.SetUp succeeded for volume "cni-bin-dir"
Normal SuccessfulMountVolume 12m kubelet, worker1.k8s MountVolume.SetUp succeeded for volume "var-run-calico"
Normal SuccessfulMountVolume 12m kubelet, worker1.k8s MountVolume.SetUp succeeded for volume "lib-modules"
Normal SuccessfulMountVolume 12m kubelet, worker1.k8s MountVolume.SetUp succeeded for volume "calico-node-token-zggt6"
Normal SandboxChanged 12m kubelet, worker1.k8s Pod sandbox changed, it will be killed and re-created.
Normal Pulled 12m kubelet, worker1.k8s Container image "quay.io/calico/cni:v3.1.3" already present on machine
Normal Created 12m kubelet, worker1.k8s Created container
Normal Started 12m kubelet, worker1.k8s Started container
## Warning Unhealthy 12m (x2 over 12m) kubelet, worker1.k8s Liveness probe failed: Get http://192.168.99.7:9099/liveness: dial tcp 192.168.99.7:9099: getsockopt: connection refused
## Warning Unhealthy 11m (x3 over 12m) kubelet, worker1.k8s Readiness probe failed: Get http://192.168.99.7:9099/readiness: dial tcp 192.168.99.7:9099: getsockopt: connection refused
Normal Started 11m (x2 over 12m) kubelet, worker1.k8s Started container
Normal Created 11m (x2 over 12m) kubelet, worker1.k8s Created container
Normal Pulled 11m (x2 over 12m) kubelet, worker1.k8s Container image "quay.io/calico/node:v3.1.3" already present on machine
Warning BackOff 2m (x47 over 11m) kubelet, worker1.k8s Back-off restarting failed container
Container Log:
kubectl logs calico-node-9kftd -n kube-system -c calico-node
2018-06-18 04:45:36.720 [INFO][9] startup.go 267: Using NODENAME environment for node name
2018-06-18 04:45:36.720 [INFO][9] startup.go 279: Determined node name: worker1.k8s
2018-06-18 04:45:36.724 [INFO][9] startup.go 302: Checking datastore connection
2018-06-18 04:45:36.754 [INFO][9] startup.go 326: Datastore connection verified
2018-06-18 04:45:36.755 [INFO][9] startup.go 99: Datastore is ready
2018-06-18 04:45:36.783 [INFO][9] startup.go 564: Using autodetected IPv4 address on interface enp0s8: 10.0.3.15/24
2018-06-18 04:45:36.783 [INFO][9] startup.go 432: Node IPv4 changed, will check for conflicts
2018-06-18 04:45:36.798 [WARNING][9] startup.go 861: Calico node 'master' is already using the IPv4 address 10.0.3.15.
2018-06-18 04:45:36.798 [INFO][9] startup.go 205: Clearing out-of-date IPv4 address from this node IP="10.0.3.15/24"
2018-06-18 04:45:36.826 [WARNING][9] startup.go 1058: Terminating
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 23 (5 by maintainers)
had the exact same issue. @tmjd thanks for the hint. you need to set auto-detect to use another method suitable for your network. E.g. adding following to calico yaml:
worked for me.
For people like me, that have just started to explore the world of k8s with a bunch of virtual boxes, and actually just want to see something like 2 nginx pods running on 2 different nodes, the network setup has turned out to be a real nightmare. Coming from docker swarm, everything was easy. Now, I see myself digging into iptables and yaml files, that are interconnected and need to be tweaked in a very special way. Don’t get me wrong - I am willing to learn whatever is necessary to manage my 3 nodes, but I am also frustrated to be sidetracked by the network, which just needs to know 2 things: where is the cluster, and which addresses can I use for my components.
@madmesi you can give multiple interface names like,
我遇到了同样的问题。(I had the same issue.) 以下是我的环境(Here is my environment): 操作系统版本(OS version):Ubuntu Server 20.04 LTS, K8S版本(Kunernetes version):1.20.4, Calico 版本(Calico version):https://docs.projectcalico.org/v3.11/manifests/calico.yaml
解决方法(Solutions):
worked for me.
Hey everyone, thanks indeed for your help, cause I was hopeless. I followed the same workaround, but in my case, I have 3 different nodes, which two of them have the interface name of “eth0”, but my worker node’s interface name is “enp3s0f0”. in the calico.yaml file, I added the line
but still no result. I tried the regex as mentioned here like following:
calico version : 3.9
in my two machines, one enp0s8 is with 192.168.56.110 and the other one with 192.168.56.117, which is the ones I would like for calico-nodes. I have read the link again and again just don’t now how to.
with either one above then followed by below, kubectl delete -f calico.yaml kubectl apply -f calico.yaml
I am still seeing the calico-node-xxxx picked up IP address from enp0s8 [root@centos7b2 ~]# kubectl describe pods -n kube-system calico-node-mkv8t Name: calico-node-mkv8t Namespace: kube-system Priority: 0 PriorityClassName: <none> Node: centos7g2/10.0.2.15 Start Time: Mon, 24 Sep 2018 13:43:15 -0700 Labels: controller-revision-hash=1427857993 k8s-app=calico-node pod-template-generation=1 Annotations: scheduler.alpha.kubernetes.io/critical-pod= Status: Running IP: 10.0.2.15 Controlled By: DaemonSet/calico-node
i know I’m in hostNetwork mode,so I shoule set host: 127.0.0.1
https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/#define-readiness-probes