calico: calico/node is not ready: BIRD is not ready: BGP not established (Calico 3.6 / k8s 1.14.1)
I’m configuring the k8s cluster with 2 nodes (ubuntu 2 vm on VirtualBox). Nodes are connected with NAT Network (10.0.0.0/24). master node : 10.0.0.10 / host-only adapter (192.168.56.101) worker node2 : 10.0.0.11
Current Behavior
Possible Solution
Steps to Reproduce (for bugs)
- Install kubeadm on both master and worker node respectively.
- Execute kubeadm init on the master node kubeadm init --pod-network-cidr=192.168.0.0/16
- Using project calico 3.6 tutorial, apply pod network.
https://docs.projectcalico.org/v3.6/getting-started/kubernetes/
kubectl apply -f
https://docs.projectcalico.org/v3.6/getting-started/kubernetes/installation/hosted/kubernetes-datastore/calico-networking/1.7/calico.yaml
Here, weird thing is that calico-node pod is ready 1/1 --> it should be 2/2 by manual.
$ kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system calico-kube-controllers-5cbcccc885-xtxck 1/1 Running 0 96s
kube-system calico-node-n4tcf 1/1 Running 0 96s
kube-system coredns-fb8b8dccf-x8thj 1/1 Running 0 4m9s
kube-system coredns-fb8b8dccf-zvmsp 1/1 Running 0 4m9s
kube-system etcd-k8s-master 1/1 Running 0 3m33s
kube-system kube-apiserver-k8s-master 1/1 Running 0 3m18s
kube-system kube-controller-manager-k8s-master 1/1 Running 0 3m29s
kube-system kube-proxy-5ck6g 1/1 Running 0 4m9s
kube-system kube-scheduler-k8s-master 1/1 Running 0 3m18s
Anyway, 1 pod status seems to be normal.
$ kubectl describe pods -n kube-system calico-node-n4tcf
...
(skipped)
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 2m21s default-scheduler Successfully assigned kube-system/calico-node-n4tcf to k8s-master
Normal Pulled 2m20s kubelet, k8s-master Container image "calico/cni:v3.6.1" already present on machine
Normal Created 2m20s kubelet, k8s-master Created container upgrade-ipam
Normal Started 2m19s kubelet, k8s-master Started container upgrade-ipam
Normal Pulled 2m19s kubelet, k8s-master Container image "calico/cni:v3.6.1" already present on machine
Normal Created 2m19s kubelet, k8s-master Created container install-cni
Normal Started 2m18s kubelet, k8s-master Started container install-cni
Normal Pulling 2m18s kubelet, k8s-master Pulling image "calico/node:v3.6.1"
Normal Pulled 2m5s kubelet, k8s-master Successfully pulled image "calico/node:v3.6.1"
Normal Created 2m5s kubelet, k8s-master Created container calico-node
Normal Started 2m4s kubelet, k8s-master Started container calico-node
- After joining worker node with kubeadm join, another calico-node was created. But, in a while, two calico-node (for both master and worker) are crashed.
$ kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system calico-kube-controllers-5cbcccc885-xtxck 1/1 Running 0 40m
kube-system calico-node-n4tcf 0/1 Running 0 40m
kube-system calico-node-sjqsr 0/1 Running 0 36m
kube-system coredns-fb8b8dccf-x8thj 1/1 Running 0 43m
kube-system coredns-fb8b8dccf-zvmsp 1/1 Running 0 43m
kube-system etcd-k8s-master 1/1 Running 0 42m
kube-system kube-apiserver-k8s-master 1/1 Running 0 42m
kube-system kube-controller-manager-k8s-master 1/1 Running 0 42m
kube-system kube-proxy-5ck6g 1/1 Running 0 43m
kube-system kube-proxy-ds549 1/1 Running 0 36m
kube-system kube-scheduler-k8s-master 1/1 Running 0 42m
calico-node-n4tcf (for master) pod was changed from ready (1/1) to (0/1). calico-node-sjqsr (newly created for worker) pod was not ready.
$ kubectl describe pod -n kube-system calico-node-n4tcf
...
(skipped)
...
Warning Unhealthy 10s (x2 over 20s) kubelet, k8s-master (combined from similar events): Readiness probe failed: Threshold time for bird readiness check: 30s
calico/node is not ready: BIRD is not ready: BGP not established with 10.0.0.112019-04-18 16:59:27.462 [INFO][607] readiness.go 88: Number of node(s) with BGP peering established = 0
$ kubectl describe pod -n kube-system calico-node-sjqsr
...
(skipped)
...
Warning Unhealthy 6s (x4 over 36s) kubelet, k8s-worker1 (combined from similar events): Readiness probe failed: Threshold time for bird readiness check: 30s
calico/node is not ready: BIRD is not ready: BGP not established with 192.168.56.1012019-04-18 16:59:49.812 [INFO][300] readiness.go 88: Number of node(s) with BGP peering established = 0
Context
I don’t know why it doesn’t work well. When I applied flannel add-on, there was no problem. Repeatedly I reset and reinstalled, but result was not different.
Could you tell me what should I do for solving the problem?
Your Environment
- Calico version : 3.6
- Orchestrator version (e.g. kubernetes, mesos, rkt): k8s 1.14.1, docker 18.9.5
- Operating System and version: Ubuntu 18.04.2
- Link to your project (optional):
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 28
- Comments: 35 (1 by maintainers)
I got the solution :
While making the cluster setup in Kubernetes if you are using calico below will be the problem.
The first preference of ifconfig(in my case <br-3faae2641db0>) is that it will try to connect worker-nodes which is not the right ip.
Solution:
Change the calico.yaml file by overriding that ip to etho-ip by using the following steps.
Needs to open port Calico networking (BGP) - 179
calico.yaml
calico.zip
Due to the fact that “eth.*” does not exist on some platforms, you may find an alternative setting to be more portable and useful. This is what I did:
This at least forced Calico to use a public-routing interface. If you wanted a private interface, stick the IP of one of a node on your internal network in there. Assuming your routing tables aren’t completely busted (and you have other problems), it’ll do the right thing more reliably.
FYI I had very similar problem (see also #2193).
My unrelated IP was
10.44.0.0:The problem was I previously tried Weave Net (and flannel also) which didn’t delete the network interface
weave, and misled calico’s autodetect feature. 😊Resolving the issue was easy as deleting that
weavenetwork interface. I got the tip from kubernetes/kubernetes#70202, from the “uninstall flannel” step:Actually, I have met this problem on 3.8.* in several different scenarios:
interface=en.+or similar. Once calico uses previous flannel interfaces, which were not cleared properly.iptables -F && iptables -t nat -F && iptables -t mangle -F && iptables -Xas suggested by kubeadm. Some system will reset your ssh connection (e.g. you ssh to one worker) when you are resetting iptables, so the four parts of the resetting commands may not finish. Please ensure this command is fully performed on each machine.Quick response for new calico version: If you are using Tigers operator for manage and configure calico DaemonSet, you should change your Tigera config file like that:
You can see more options here : https://docs.projectcalico.org/reference/node/configuration
IP_AUTODETECTION_METHOD was also the same problem for me. The calico-node-* pods in the kube-system namespace wer in 0/0 ready state.
Run: kubectl set env daemonset/calico-node -n kube-system IP_AUTODETECTION_METHOD=can-reach=www.google.com
Then the pods became ready within a few seconds.
Reference: https://docs.projectcalico.org/networking/ip-autodetection#change-the-autodetection-method
Check the calico logs definitely you will get some idea.
In my case, this problem still exists.
After uninstalling Calico 3.7, I installed Flannel, and then uninstalled Flannel to install Calico 3.10.1. I know it’s weird to do this, and now something strange is happening.
I have removed Flannel’s interface:
ifconfig does not display any interfaces of flannel.
Server real internal IP: master 172.17.106.122 node1 172.17.106.121 node2 172.17.106.120
My real network card eth0 was deployed using the yaml file of the official 3.10 calico.
My POD CIDR is 10.100.0.1/20, which proves correct.
Then any of my nodes will prompt this error:
Readiness probe failed: calico / node is not ready: BIRD is not ready: BGP not established with 172.17.106.121,172.17.106.1202019-12-11 06: 54: 11.577 [INFO] [127] health.go 156: Number of node (s) with BGP peering established = 0
The IPs that appear inside are the contents of the real internal IP of the server above.
It will not be another IP that does not exist.
This is incredible.
While operating k8s using a virtual machine, I experience the same phenomenon, solve it, and leave a post. The host machine is using Centos7 and operated 3 K8s nodes through KVM. Activate the network interface used by calico-node through firewall-cmd. Looking at the log of calico-node, gatway could not be called, but this occurred because the virtual machine interface of the host is not in the internal zone and is not activated.
Updating this section in @MadAboutProgramming 's solution resolved my problem.
It worked for me. Thanks a lot
In my case the following setting did it!
In my case it’s all about adding
TCP 179to Nodes Security Group (for AWS installation).for Centos i found a solution, it is about the IP_AUTODETECTION_METHOD
you have to find your connection device first with this code
nmcli connection showfor my case it was nm-bond
then you can update calico env with this code
kubectl set env daemonset/calico-node -n kube-system IP_AUTODETECTION_METHOD=interface=nm-bondvalidated the change:
kubectl get daemonset/calico-node -n kube-system --output json | jq '.spec.template.spec.containers[].env[] | select(.name | startswith("IP"))'Experiencing same issue with Centos7 vm machine… can you share the solution?
I resolve my problem by using @MadAboutProgramming 's solution. Thanks a lot!