calico: kube-controllers failing to query API: Context deadline exceeded
Expected Behavior
When installing a fresh master node, I expect when I apply the calico yaml that the pod network starts, as did with another setup I installed.
Current Behavior
coredns does not start, calico-node starts without a problem, but calico-kube-controller outputs an error:
2019-08-27 09:52:10.607 [INFO][1] main.go 113: Ensuring Calico datastore is initialized
2019-08-27 09:52:20.608 [ERROR][1] client.go 255: Error getting cluster information config ClusterInformation="default" error=Get https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded
2019-08-27 09:52:20.608 [FATAL][1] main.go 118: Failed to initialize Calico datastore error=Get https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded
I verified 10.96.0.1:443 is running and reachable from containers in the kube-system namespace.
All interfaces get created on the host, and all containers that ask for an IP get one.
Possible Solution
None as of yet.
Steps to Reproduce (for bugs)
kubeadm init --kubernetes-version=v1.15.2 --pod-network-cidr=10.246.0.0/16 --apiserver-advertise-address=10.5.0.5curl -s https://docs.projectcalico.org/v3.8/manifests/calico.yaml > calico-3.8.2.yamlPOD_CIDR="10.246.0.0/16" sed -i -e "s?192.168.0.0/16?$POD_CIDR?g" calico-3.8.2.yamlkubectl apply -f calico-3.8.2.yaml
Context
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system calico-kube-controllers-65b8787765-rdkrm 0/1 CrashLoopBackOff 7 13m
kube-system calico-node-x5qcv 1/1 Running 0 13m
kube-system coredns-5c98db65d4-fhb4c 0/1 CrashLoopBackOff 6 17m
kube-system coredns-5c98db65d4-rzd5c 0/1 CrashLoopBackOff 6 17m
kube-system etcd-dcc-5-host-3 1/1 Running 0 16m
kube-system kube-apiserver-dcc-5-host-3 1/1 Running 0 16m
kube-system kube-controller-manager-dcc-5-host-3 1/1 Running 0 16m
kube-system kube-proxy-k72mq 1/1 Running 0 17m
kube-system kube-scheduler-dcc-5-host-3 1/1 Running 0 16m
kubectl describe pod -n kube-system calico-kube-controllers-65b8787765-rdkrm
Name: calico-kube-controllers-65b8787765-rdkrm
Namespace: kube-system
Priority: 2000000000
Priority Class Name: system-cluster-critical
Node: debian/37.97.227.88
Start Time: Tue, 27 Aug 2019 11:51:54 +0200
Labels: k8s-app=calico-kube-controllers
pod-template-hash=65b8787765
Annotations: cni.projectcalico.org/podIP: 10.246.201.193/32
scheduler.alpha.kubernetes.io/critical-pod:
Status: Running
IP: 10.246.201.193
Controlled By: ReplicaSet/calico-kube-controllers-65b8787765
Containers:
calico-kube-controllers:
Container ID: docker://6c54ccc03728289128fa7bce0bca06fc704c9241896698616c72389f64735ee9
Image: calico/kube-controllers:v3.8.2
Image ID: docker-pullable://calico/kube-controllers@sha256:afc0e28b569059abc6f5e199048c2b4f1d520dece9b16e4ddc3e4edb477c72ed
Port: <none>
Host Port: <none>
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Tue, 27 Aug 2019 12:25:05 +0200
Finished: Tue, 27 Aug 2019 12:25:15 +0200
Ready: False
Restart Count: 11
Readiness: exec [/usr/bin/check-status -r] delay=0s timeout=1s period=10s #success=1 #failure=3
Environment:
ENABLED_CONTROLLERS: node
DATASTORE_TYPE: kubernetes
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from calico-kube-controllers-token-jvbr7 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
calico-kube-controllers-token-jvbr7:
Type: Secret (a volume populated by a Secret)
SecretName: calico-kube-controllers-token-jvbr7
Optional: false
QoS Class: BestEffort
Node-Selectors: beta.kubernetes.io/os=linux
Tolerations: CriticalAddonsOnly
node-role.kubernetes.io/master:NoSchedule
node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 34m (x3 over 34m) default-scheduler 0/1 nodes are available: 1 node(s) had taints that the pod didn't tolerate.
Normal Scheduled 33m default-scheduler Successfully assigned kube-system/calico-kube-controllers-65b8787765-rdkrm to debian
Warning Unhealthy 32m (x4 over 33m) kubelet, debian Readiness probe failed: Failed to read status file status.json: open status.json: no such file or directory
Normal Pulled 31m (x5 over 33m) kubelet, debian Container image "calico/kube-controllers:v3.8.2" already present on machine
Normal Created 31m (x5 over 33m) kubelet, debian Created container calico-kube-controllers
Normal Started 31m (x5 over 33m) kubelet, debian Started container calico-kube-controllers
Warning BackOff 3m45s (x129 over 33m) kubelet, debian Back-off restarting failed container
Your Environment
- Calico version
image: calico/cni:v3.8.2
image: calico/cni:v3.8.2
image: calico/pod2daemon-flexvol:v3.8.2
image: calico/node:v3.8.2
image: calico/kube-controllers:v3.8.2
Using kubernetes as datastore: CALICO_DATASTORE_TYPE=kubernetes
- Orchestrator version (e.g. kubernetes, mesos, rkt):
#: kubectl version
Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.3", GitCommit:"2d3c76f9091b6bec110a5e63777c332469e0cba2", GitTreeState:"clean", BuildDate:"2019-08-19T11:13:54Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.2", GitCommit:"f6278300bebbb750328ac16ee6dd3aa7d3549568", GitTreeState:"clean", BuildDate:"2019-08-05T09:15:22Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}
- Operating System and version:
#: uname -a
Linux debian 4.19.0-5-amd64 #1 SMP Debian 4.19.37-5+deb10u2 (2019-08-08) x86_64 GNU/Linux
#: hostname
debian
#: apt search docker-ce
docker-ce/stretch,now 5:18.09.8~3-0~debian-stretch amd64 [installed]
Host interfaces (after calico initialization):
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether <mac>
inet <public ip> brd <brd> scope global dynamic ens3
valid_lft 85455sec preferred_lft 85455sec
3: ens7: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether <mac>
inet 10.5.0.5/24 brd 10.5.0.255 scope global ens7
valid_lft forever preferred_lft forever
4: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
link/ether <mac>
inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
valid_lft forever preferred_lft forever
7: tunl0@NONE: <NOARP,UP,LOWER_UP> mtu 1440 qdisc noqueue state UNKNOWN group default qlen 1000
link/ipip 0.0.0.0 brd 0.0.0.0
inet 10.246.201.192/32 brd 10.246.201.192 scope global tunl0
valid_lft forever preferred_lft forever
33: caliafb5ba50c83@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1440 qdisc noqueue state UP group default
link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 0
34: cali4e91c4a8f9e@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1440 qdisc noqueue state UP group default
link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 1
35: cali794af303831@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1440 qdisc noqueue state UP group default
link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 2
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 36 (8 by maintainers)
I am facing same issue with Kubernetes Calico :
I am using Calico 3.11 After restart it was giving me below error
Facing same issue
can you try restart docker ? systemctl restart docker
Hi,
I see the same problem during a kops upgrade:
Killing the calico controller POD solves the problem.
The setup is a single-master multi-node cluster. kops is version 1.18.0 and the Upgrade for k8s was from 1.17.9 to 1.18.8. kops changed calico version from v3.13.4 to v3.15.1 also replacing a lot of CRDs.
If you have an enabled firewall and use Calico as your CNI library, you must ensure that the following rules have been applied over your firewall:
Both on Master and Worker Nodes:
On the Master Node:
sudo ufw allow from 10.10.10.0/24sudo ufw allow from 172.17.0.0/16On the Worker node(s):
Don’t do this. 192.169.0.0/16 is not an unallocated CIDR. Someone owns that network, and it’s publicly routable.
then you should log more specific details like connection refused or timeout instead of
context deadline exceededJust wanted to add I was facing this issue for a while and stuck on Calico v3.15.x but have not seen this issue since upgrading to v3.18.1. I am on Kubernetes 1.19.7.
I had the same issue in v1.20 on fedora 33 and I found this post: https://upcloud.com/community/tutorials/install-kubernetes-cluster-centos-8/
The command that makes my controller work was: firewall-cmd --zone=public --permanent --add-rich-rule ‘rule family=ipv4 source address=worker-IP-address/mask accept’