microk8s: calicp-kube-controllers can't get API Server: context deadline exceeded

I have the problem after having installed microk8s on my Ubuntu 21.10 server:

sudo snap install microk8s --channel=1.23 --classic

Checked the pods and saw that one crushed:

$ microk8s.kubectl get pods -n kube-system
calico-node-c7h46                          1/1     Running            1 (7m38s ago)   10m
calico-kube-controllers-5ddf994775-gp8cv   0/1     CrashLoopBackOff   7 (34s ago)     10m

In logs , I see that something is wrong with the API Server:

$ microk8s.kubectl logs calico-kube-controllers-5ddf994775-gp8cv  -n kube-system
2022-04-22 13:26:15.311 [INFO][1] main.go 88: Loaded configuration from environment config=&config.Config{LogLevel:"info", WorkloadEndpointWorkers:1, ProfileWorkers:1, PolicyWorkers:1, NodeWorkers:1, Kubeconfig:"", DatastoreType:"kubernetes"}
W0422 13:26:15.312587       1 client_config.go:543] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
2022-04-22 13:26:15.313 [INFO][1] main.go 109: Ensuring Calico datastore is initialized
2022-04-22 13:26:25.313 [ERROR][1] client.go 261: Error getting cluster information config ClusterInformation="default" error=Get "https://10.152.183.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default": context deadline exceeded
2022-04-22 13:26:25.313 [FATAL][1] main.go 114: Failed to initialize Calico datastore error=Get "https://10.152.183.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default": context deadline exceeded

Result of microk8s.inspect

inspection-report-20220422_132011.tar.gz

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 46 (12 by maintainers)

Most upvoted comments

Hi could you please try this.

  1. Edit /etc/modules and add in there a new line: br_netfilter . This will load br_netfilter at boot time.
  2. sudo microk8s stop to stop MicroK8s services
  3. Edit /var/snap/microk8s/current/args/kube-proxy and remove the --proxy-mode completely.
  4. sudo modprobe br_netfilter to load the br_netfilter if not already loaded.
  5. sudo microk8s start to start MicroK8s services

Hey folks, we’ll take a look at this right away. Thank you all for raising this!

Would it be possible to explain what happened, and why br_netfilter and removing the proxy was the fix? I only ask to satisfy my own curiosity.

The CNI used by default in MicroK8s is calico. Calico works best with the br_netfilter kernel module loaded. When MicroK8s starts it tries to load the br_netfilter module, if it fails it sets the proxy-mode to userspace. Userspace routing means that the routing is taken care in userspace instead of via iptable rules. This proxy-mode is the oldest mode and is kept for compatibility reasons. The issue you are seeing is that both MicroK8s fails to load the kernel module and calico fails to play well with the userspace routing. Reproducing this issue is not straight forward. I see it happening under certain conditions on Ubuntu 21.10 but not on any of the 18.04, 20.04, 22.04. Maybe some combination of libraries is at fault here that I only happen to find in 21.10.

In any case, we will be shipping a patch in the following days for this issue. We would appreciate if you could verify that the edge channel of the track you are using works for you. You can test this by doing a fresh install or refreshing to the respective channel, eg assuming you are on the 1.23 track you can do sudo snap refresh microk8s --channel=1.23/edge. Thank you and apologies any the trouble we may have caused.

Here is a source of errors I would appreciate if you could please eliminate.

When the calico CNI sets up the network it needs to select a network interface through which it will route traffic. In /var/snap/microk8s/current/args/cni-network/cni.yaml search for IP_AUTODETECTION_METHOD and you will see that calico will use by default the “first-found” interface to route traffic. It is possible this interface auto-detection method is selecting an inappropriate interface (eg an interface of lxd). Let’s try to provide a hint on which interface it should be used. Edit /var/snap/microk8s/current/args/cni-network/cni.yaml and replace first-found with can-reach=<IP_IN_NETWORK_TO_BE USED> with <IP_IN_NETWORK_TO_BE USED> being an IP of a machine in the network we want to use for routing traffic. I think that could be the public facing IP of the host. Then reapply the cni.yaml with microk8s kubectl apply -f /var/snap/microk8s/current/args/cni-network/cni.yaml. In case of multi-node clusters we are able to identify where we should route traffic through because we know where the join node is reached from so this problem should not be present in multi-node clusters.

Yes, appears as if the error was reintroduced. Added br_netfilter as well to the /etc/modules file and restarted entire system. No resolution. k3s works smoothly. All pods up and running without restart.

ENVIRONMENT

MicroK8s v1.26.0 revision 4390 on a NUC Intel Celeron N5095, 16GB RAM, 1TB SSD MicroK8s v1.26.0 revision 4390 on a NUC Intel Celeron N3350, 4GB RAM, 512GB SSD

Distributor ID: Ubuntu Description: Ubuntu 22.04.1 LTS Release: 22.04 Codename: jammy

Reproduce:

NAMESPACE     NAME                                           READY   STATUS             RESTARTS      AGE
kube-system   pod/calico-node-gpj5s                          1/1     Running            3 (32s ago)   57m
kube-system   pod/calico-kube-controllers-7874bcdbb4-5ftc2   0/1     CrashLoopBackOff   14 (9s ago)   57m

NAMESPACE   NAME                 TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)   AGE
default     service/kubernetes   ClusterIP   10.152.183.1   <none>        443/TCP   57m

NAMESPACE     NAME                         DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR            AGE
kube-system   daemonset.apps/calico-node   1         1         1       1            1           kubernetes.io/os=linux   57m

NAMESPACE     NAME                                      READY   UP-TO-DATE   AVAILABLE   AGE
kube-system   deployment.apps/calico-kube-controllers   1/1     1            1           57m

NAMESPACE     NAME                                                 DESIRED   CURRENT   READY   AGE
kube-system   replicaset.apps/calico-kube-controllers-79568db7f8   0         0         0       57m
kube-system   replicaset.apps/calico-kube-controllers-7874bcdbb4   1         1         1       57m

kc logs pod/calico-kube-controllers-7874bcdbb4-5ftc2 -n kube-system -f

2022-12-27 22:40:30.430 [WARNING][1] runconfig.go 162: unable to get KubeControllersConfiguration(default) error=Get "https://10.152.183.1:443/apis/crd.projectcalico.org/v1/kubecontrollersconfigurations/default": dial tcp 10.152.183.1:443: connect: no route to host
E1227 22:40:30.430453       1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.23.3/tools/cache/reflector.go:167: Failed to watch *v1.Pod: failed to list *v1.Pod: Get "https://10.152.183.1:443/api/v1/pods?resourceVersion=3624": dial tcp 10.152.183.1:443: connect: no route to host
E1227 22:40:30.430461       1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.23.3/tools/cache/reflector.go:167: Failed to watch *v1.Node: failed to list *v1.Node: Get "https://10.152.183.1:443/api/v1/nodes?resourceVersion=3520": dial tcp 10.152.183.1:443: connect: no route to host
2022-12-27 22:40:34.242 [ERROR][1] client.go 272: Error getting cluster information config ClusterInformation="default" error=Get "https://10.152.183.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default": context deadline exceeded
2022-12-27 22:40:34.242 [ERROR][1] main.go 242: Failed to verify datastore error=Get "https://10.152.183.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default": context deadline exceeded
2022-12-27 22:40:34.498 [ERROR][1] main.go 277: Received bad status code from apiserver error=Get "https://10.152.183.1:443/healthz?timeout=20s": dial tcp 10.152.183.1:443: connect: no route to host status=0
2022-12-27 22:40:34.498 [WARNING][1] runconfig.go 162: unable to get KubeControllersConfiguration(default) error=Get "https://10.152.183.1:443/apis/crd.projectcalico.org/v1/kubecontrollersconfigurations/default": dial tcp 10.152.183.1:443: connect: no route to host
W1227 22:40:34.498077       1 reflector.go:324] pkg/mod/k8s.io/client-go@v0.23.3/tools/cache/reflector.go:167: failed to list *v1.Pod: Get "https://10.152.183.1:443/api/v1/pods?resourceVersion=3624": dial tcp 10.152.183.1:443: connect: no route to host
E1227 22:40:34.498257       1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.23.3/tools/cache/reflector.go:167: Failed to watch *v1.Pod: failed to list *v1.Pod: Get "https://10.152.183.1:443/api/v1/pods?resourceVersion=3624": dial tcp 10.152.183.1:443: connect: no route to host

This is happening to me in 1.24, ubuntu 22.04 on raspberry pi

I did kubectl rollout restart deployment -n kube-system calico-kube-controllers The pod is still in crashloop.

Here is the output of ethtool --show-offload vxlan.calico

Features for vxlan.calico:
rx-checksumming: off
tx-checksumming: off
        tx-checksum-ipv4: off [fixed]
        tx-checksum-ip-generic: off
        tx-checksum-ipv6: off [fixed]
        tx-checksum-fcoe-crc: off [fixed]
        tx-checksum-sctp: off [fixed]
scatter-gather: on
        tx-scatter-gather: on
        tx-scatter-gather-fraglist: on
tcp-segmentation-offload: off
        tx-tcp-segmentation: off [requested on]
        tx-tcp-ecn-segmentation: off [requested on]
        tx-tcp-mangleid-segmentation: off [requested on]
        tx-tcp6-segmentation: off [requested on]
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: off [fixed]
tx-vlan-offload: off [fixed]
ntuple-filters: off [fixed]
receive-hashing: off [fixed]
highdma: off [fixed]
rx-vlan-filter: off [fixed]
vlan-challenged: off [fixed]
tx-lockless: on [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-gre-csum-segmentation: off [fixed]
tx-ipxip4-segmentation: off [fixed]
tx-ipxip6-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
tx-udp_tnl-csum-segmentation: off [fixed]
tx-gso-partial: off [fixed]
tx-tunnel-remcsum-segmentation: off [fixed]
tx-sctp-segmentation: on
tx-esp-segmentation: off [fixed]
tx-udp-segmentation: on
tx-gso-list: on
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off [fixed]
rx-all: off [fixed]
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
hw-tc-offload: off [fixed]
esp-hw-offload: off [fixed]
esp-tx-csum-hw-offload: off [fixed]
rx-udp_tunnel-port-offload: off [fixed]
tls-hw-tx-offload: off [fixed]
tls-hw-rx-offload: off [fixed]
rx-gro-hw: off [fixed]
tls-hw-record: off [fixed]
rx-gro-list: off
macsec-hw-offload: off [fixed]
rx-udp-gro-forwarding: off
hsr-tag-ins-offload: off [fixed]
hsr-tag-rm-offload: off [fixed]
hsr-fwd-offload: off [fixed]
hsr-dup-offload: off [fixed]