calico: kube-controllers failing to query API: Context deadline exceeded

Expected Behavior

When installing a fresh master node, I expect when I apply the calico yaml that the pod network starts, as did with another setup I installed.

Current Behavior

coredns does not start, calico-node starts without a problem, but calico-kube-controller outputs an error:

2019-08-27 09:52:10.607 [INFO][1] main.go 113: Ensuring Calico datastore is initialized
2019-08-27 09:52:20.608 [ERROR][1] client.go 255: Error getting cluster information config ClusterInformation="default" error=Get https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded
2019-08-27 09:52:20.608 [FATAL][1] main.go 118: Failed to initialize Calico datastore error=Get https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded

I verified 10.96.0.1:443 is running and reachable from containers in the kube-system namespace.

All interfaces get created on the host, and all containers that ask for an IP get one.

Possible Solution

None as of yet.

Steps to Reproduce (for bugs)

  1. kubeadm init --kubernetes-version=v1.15.2 --pod-network-cidr=10.246.0.0/16 --apiserver-advertise-address=10.5.0.5
  2. curl -s https://docs.projectcalico.org/v3.8/manifests/calico.yaml > calico-3.8.2.yaml
  3. POD_CIDR="10.246.0.0/16" sed -i -e "s?192.168.0.0/16?$POD_CIDR?g" calico-3.8.2.yaml
  4. kubectl apply -f calico-3.8.2.yaml

Context

NAMESPACE     NAME                                       READY   STATUS             RESTARTS   AGE
kube-system   calico-kube-controllers-65b8787765-rdkrm   0/1     CrashLoopBackOff   7          13m
kube-system   calico-node-x5qcv                          1/1     Running            0          13m
kube-system   coredns-5c98db65d4-fhb4c                   0/1     CrashLoopBackOff   6          17m
kube-system   coredns-5c98db65d4-rzd5c                   0/1     CrashLoopBackOff   6          17m
kube-system   etcd-dcc-5-host-3                          1/1     Running            0          16m
kube-system   kube-apiserver-dcc-5-host-3                1/1     Running            0          16m
kube-system   kube-controller-manager-dcc-5-host-3       1/1     Running            0          16m
kube-system   kube-proxy-k72mq                           1/1     Running            0          17m
kube-system   kube-scheduler-dcc-5-host-3                1/1     Running            0          16m

kubectl describe pod -n kube-system calico-kube-controllers-65b8787765-rdkrm

Name:                 calico-kube-controllers-65b8787765-rdkrm
Namespace:            kube-system
Priority:             2000000000
Priority Class Name:  system-cluster-critical
Node:                 debian/37.97.227.88
Start Time:           Tue, 27 Aug 2019 11:51:54 +0200
Labels:               k8s-app=calico-kube-controllers
                      pod-template-hash=65b8787765
Annotations:          cni.projectcalico.org/podIP: 10.246.201.193/32
                      scheduler.alpha.kubernetes.io/critical-pod: 
Status:               Running
IP:                   10.246.201.193
Controlled By:        ReplicaSet/calico-kube-controllers-65b8787765
Containers:
  calico-kube-controllers:
    Container ID:   docker://6c54ccc03728289128fa7bce0bca06fc704c9241896698616c72389f64735ee9
    Image:          calico/kube-controllers:v3.8.2
    Image ID:       docker-pullable://calico/kube-controllers@sha256:afc0e28b569059abc6f5e199048c2b4f1d520dece9b16e4ddc3e4edb477c72ed
    Port:           <none>
    Host Port:      <none>
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Tue, 27 Aug 2019 12:25:05 +0200
      Finished:     Tue, 27 Aug 2019 12:25:15 +0200
    Ready:          False
    Restart Count:  11
    Readiness:      exec [/usr/bin/check-status -r] delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:
      ENABLED_CONTROLLERS:  node
      DATASTORE_TYPE:       kubernetes
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from calico-kube-controllers-token-jvbr7 (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  calico-kube-controllers-token-jvbr7:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  calico-kube-controllers-token-jvbr7
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  beta.kubernetes.io/os=linux
Tolerations:     CriticalAddonsOnly
                 node-role.kubernetes.io/master:NoSchedule
                 node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason            Age                    From                   Message
  ----     ------            ----                   ----                   -------
  Warning  FailedScheduling  34m (x3 over 34m)      default-scheduler      0/1 nodes are available: 1 node(s) had taints that the pod didn't tolerate.
  Normal   Scheduled         33m                    default-scheduler      Successfully assigned kube-system/calico-kube-controllers-65b8787765-rdkrm to debian
  Warning  Unhealthy         32m (x4 over 33m)      kubelet, debian        Readiness probe failed: Failed to read status file status.json: open status.json: no such file or directory
  Normal   Pulled            31m (x5 over 33m)      kubelet, debian        Container image "calico/kube-controllers:v3.8.2" already present on machine
  Normal   Created           31m (x5 over 33m)      kubelet, debian        Created container calico-kube-controllers
  Normal   Started           31m (x5 over 33m)      kubelet, debian        Started container calico-kube-controllers
  Warning  BackOff           3m45s (x129 over 33m)  kubelet, debian        Back-off restarting failed container

Your Environment

  • Calico version
image: calico/cni:v3.8.2
image: calico/cni:v3.8.2
image: calico/pod2daemon-flexvol:v3.8.2
image: calico/node:v3.8.2
image: calico/kube-controllers:v3.8.2

Using kubernetes as datastore: CALICO_DATASTORE_TYPE=kubernetes

  • Orchestrator version (e.g. kubernetes, mesos, rkt):
#: kubectl version
Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.3", GitCommit:"2d3c76f9091b6bec110a5e63777c332469e0cba2", GitTreeState:"clean", BuildDate:"2019-08-19T11:13:54Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.2", GitCommit:"f6278300bebbb750328ac16ee6dd3aa7d3549568", GitTreeState:"clean", BuildDate:"2019-08-05T09:15:22Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}
  • Operating System and version:
#: uname -a
Linux debian 4.19.0-5-amd64 #1 SMP Debian 4.19.37-5+deb10u2 (2019-08-08) x86_64 GNU/Linux
#: hostname
debian
#: apt search docker-ce
docker-ce/stretch,now 5:18.09.8~3-0~debian-stretch amd64 [installed]

Host interfaces (after calico initialization):

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether <mac>
    inet <public ip> brd <brd> scope global dynamic ens3
       valid_lft 85455sec preferred_lft 85455sec
3: ens7: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether <mac>
    inet 10.5.0.5/24 brd 10.5.0.255 scope global ens7
       valid_lft forever preferred_lft forever
4: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default 
    link/ether <mac>
    inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
       valid_lft forever preferred_lft forever
7: tunl0@NONE: <NOARP,UP,LOWER_UP> mtu 1440 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ipip 0.0.0.0 brd 0.0.0.0
    inet 10.246.201.192/32 brd 10.246.201.192 scope global tunl0
       valid_lft forever preferred_lft forever
33: caliafb5ba50c83@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1440 qdisc noqueue state UP group default 
    link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 0
34: cali4e91c4a8f9e@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1440 qdisc noqueue state UP group default 
    link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 1
35: cali794af303831@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1440 qdisc noqueue state UP group default 
    link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 2

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 36 (8 by maintainers)

Most upvoted comments

I am facing same issue with Kubernetes Calico :

I am using Calico 3.11 After restart it was giving me below error

kubectl apply -f https://docs.projectcalico.org/v3.11/manifests/calico.yaml
2020-02-08 16:41:52.914 [ERROR][1] client.go 255: Error getting cluster information config ClusterInformation="default" error=Get https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded
2020-02-08 16:41:52.914 [FATAL][1] main.go 114: Failed to initialize Calico datastore error=Get https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded

Facing same issue

2020-08-28 01:44:23.778 [INFO][1] main.go 88: Loaded configuration from environment config=&config.Config{LogLevel:"info", WorkloadEndpointWorkers:1, ProfileWorkers:1, PolicyWorkers:1, NodeWorkers:1, Kubeconfig:"", DatastoreType:"kubernetes"}
W0828 01:44:23.779818       1 client_config.go:541] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
2020-08-28 01:44:23.780 [INFO][1] main.go 109: Ensuring Calico datastore is initialized
2020-08-28 01:44:33.781 [ERROR][1] client.go 261: Error getting cluster information config ClusterInformation="default" error=Get https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded
2020-08-28 01:44:33.781 [FATAL][1] main.go 114: Failed to initialize Calico datastore error=Get https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded

can you try restart docker ? systemctl restart docker

Hi,

I see the same problem during a kops upgrade:

2020-08-17 06:58:42.376 [ERROR][1] main.go 207: Failed to verify datastore error=Get https://100.64.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded
2020-08-17 06:59:14.376 [ERROR][1] main.go 238: Failed to reach apiserver error=Get https://100.64.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded
2020-08-17 06:59:34.376 [ERROR][1] client.go 261: Error getting cluster information config ClusterInformation="default" error=Get https://100.64.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded
2020-08-17 06:59:34.376 [ERROR][1] main.go 207: Failed to verify datastore error=Get https://100.64.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded
2020-08-17 07:00:06.376 [ERROR][1] main.go 238: Failed to reach apiserver error=Get https://100.64.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded
2020-08-17 07:00:26.377 [ERROR][1] client.go 261: Error getting cluster information config ClusterInformation="default" error=Get https://100.64.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded
2020-08-17 07:00:26.377 [ERROR][1] main.go 207: Failed to verify datastore error=Get https://100.64.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded

Killing the calico controller POD solves the problem.

The setup is a single-master multi-node cluster. kops is version 1.18.0 and the Upgrade for k8s was from 1.17.9 to 1.18.8. kops changed calico version from v3.13.4 to v3.15.1 also replacing a lot of CRDs.

If you have an enabled firewall and use Calico as your CNI library, you must ensure that the following rules have been applied over your firewall:

Both on Master and Worker Nodes:

sudo ufw default deny incoming
sudo ufw default allow outgoing
sudo ufw allow ssh

On the Master Node:

sudo ufw allow 6443/tcp
sudo ufw allow 2379/tcp
sudo ufw allow 2380/tcp
sudo ufw allow 10250/tcp
sudo ufw allow 10251/tcp
sudo ufw allow 10252/tcp

10.10.10.0 is the CIDR of your host subnet (your master host IP address can be 10.10.10.218 in this way)

sudo ufw allow from 10.10.10.0/24

172.17.0.0 is the Docker CIDR (could be different if you have specified a non-default CIDR in the docker configuration file)

sudo ufw allow from 172.17.0.0/16

On the Worker node(s):

sudo ufw allow 10250/tcp
sudo ufw allow 30000:32767/tcp

i fixed this exception after changed pod network from 192.168.0.0/16 to 192.169.0.0/16

Don’t do this. 192.169.0.0/16 is not an unallocated CIDR. Someone owns that network, and it’s publicly routable.

then you should log more specific details like connection refused or timeout instead of context deadline exceeded

Just wanted to add I was facing this issue for a while and stuck on Calico v3.15.x but have not seen this issue since upgrading to v3.18.1. I am on Kubernetes 1.19.7.

I had the same issue in v1.20 on fedora 33 and I found this post: https://upcloud.com/community/tutorials/install-kubernetes-cluster-centos-8/

The command that makes my controller work was: firewall-cmd --zone=public --permanent --add-rich-rule ‘rule family=ipv4 source address=worker-IP-address/mask accept’