calico: Installing Calico 3.6 in Kubernetes results in error in calico-kube-controllers

I am trying to upgrade Calico in my k8s cluster from 3.3 to 3.6. To upgrade, I delete the previously created resources and create new ones. The pod calico-kube-controllers is stuck in ContainerCreating, so none of the calico-node pods start.

Expected Behavior

Calico pods described in the used manifest are created and start running.

Current Behavior

Calico-kube-controllers does not exit the ContainerCreating state. kubectl describe pod shows this error: Failed create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "4a3c5993de2bb25bb59c33f55a1ea65f2584980c83c4704ebde6af5bec3e09b5" network for pod "calico-kube-controllers-5cbcccc885-nddnj": NetworkPlugin cni failed to set up pod "calico-kube-controllers-5cbcccc885-nddnj_kube-system" network: error getting ClusterInformation: connection is unauthorized: Unauthorized, failed to clean up sandbox container "4a3c5993de2bb25bb59c33f55a1ea65f2584980c83c4704ebde6af5bec3e09b5" network for pod "calico-kube-controllers-5cbcccc885-nddnj": NetworkPlugin cni failed to teardown pod "calico-kube-controllers-5cbcccc885-nddnj_kube-system" network: error getting ClusterInformation: connection is unauthorized: Unauthorized]

Possible Solution

I am able to upgrade all the way to 3.5, which is the last version without calico-kube-controllers, so I assume there is something going on with this new addition?

Steps to Reproduce (for bugs)

I do not have a fresh cluster to test this on, but how I got where I am is this:

Create a cluster with kubeadm (I have been through about 3 major Kubernetes verison upgrades with the cluster)
Install Calico 3.3 following https://docs.projectcalico.org/v3.3/getting-started/kubernetes/installation/calico#installing-with-the-kubernetes-api-datastore50-nodes-or-less
Delete the resources created in step 2. with kubectl delete -f
Install Calico 3.6 following https://docs.projectcalico.org/v3.6/getting-started/kubernetes/installation/calico#installing-with-the-kubernetes-api-datastore50-nodes-or-less

Context

I’m trying to upgrade the Calico version in my cluster.

Your Environment

Calico version 3.3, trying to get to 3.6
Orchestrator version (e.g. kubernetes, mesos, rkt): Kubernetes, created via kubeadm
Operating System and version: Centos 7

About this issue

Original URL
State: closed
Created 5 years ago
Reactions: 10
Comments: 25 (6 by maintainers)

Most upvoted comments

Fixed it by removing leftover files in /var/lib/cni. I believe that’s a bug for users upgrading from earlier Calico versions.

+21

gertvdijk on Jul 17, 2019

@caseydavenport I tried applying the new installation instead of deleting the old one and creating it, it took the pods a while but seems like they started up okay and I didn’t see any errors in their logs. Thanks! 😃

proskehy on Jun 11, 2019

It looks like the same problem I currently have with upgrading from 3.1 to 3.7.

niekvn1 on May 13, 2019

Was this actually fixed? I’m experiencing what I think is the same issue on a clean Kubernetes 1.15.0 cluster, kubeadm reset & kubeadm init on Ubuntu 18.04 with Docker-CE 18.09.7 and Calico 3.8.

(The same system worked fine with Kubernetes 1.14.1 & Calico 3.3.6.)

Following exactly the steps in https://docs.projectcalico.org/v3.8/getting-started/kubernetes/ leads me to a failure in step 5; the pods don’t come up.

# kubectl --kubeconfig /etc/kubernetes/admin.conf get pods --all-namespaces
NAMESPACE     NAME                                                      READY   STATUS     RESTARTS   AGE
kube-system   calico-kube-controllers-59f54d6bbc-jkvws                  0/1     Pending    0          9m4s
kube-system   calico-node-t5zdj                                         0/1     Init:0/3   0          9m4s

(As you can see I’ve given it 9 minutes…)

# kubectl --kubeconfig /etc/kubernetes/admin.conf describe -n kube-system pod/calico-node-t5zdj
[...]
Events:
  Type    Reason     Age   From                                      Message
  ----    ------     ----  ----                                      -------
  Normal  Scheduled  12m   default-scheduler                         Successfully assigned kube-system/calico-node-t5zdj to master.cluster.mydomain.tld
  Normal  Pulling    12m   kubelet, master.cluster.mydomain.tld  Pulling image "calico/cni:v3.8.0"
  Normal  Pulled     12m   kubelet, master.cluster.mydomain.tld  Successfully pulled image "calico/cni:v3.8.0"
  Normal  Created    12m   kubelet, master.cluster.mydomain.tld  Created container upgrade-ipam
  Normal  Started    12m   kubelet, master.cluster.mydomain.tld  Started container upgrade-ipam

Logs:

# kubectl --kubeconfig /etc/kubernetes/admin.conf logs -n kube-system pod/calico-node-t5zdj
Error from server (BadRequest): container "calico-node" in pod "calico-node-t5zdj" is waiting to start: PodInitializing

# kubectl --kubeconfig /etc/kubernetes/admin.conf logs -n kube-system -c upgrade-ipam pod/calico-node-t5zdj
2019-07-17 13:15:37.646 [INFO][1] ipam_plugin.go 68: migrating from host-local to calico-ipam...
2019-07-17 13:15:37.648 [INFO][1] k8s.go 228: Using Calico IPAM
2019-07-17 13:15:37.648 [INFO][1] migrate.go 65: checking host-local IPAM data dir dir existence...
2019-07-17 13:15:37.648 [INFO][1] migrate.go 72: retrieving node for IPIP tunnel address
2019-07-17 13:15:37.689 [INFO][1] migrate.go 80: IPIP tunnel address not found, assigning...
2019-07-17 13:15:37.699 [INFO][1] ipam.go 583: Assigning IP 192.168.0.1 to host: my.node.fqdn.tld
2019-07-17 13:15:37.709 [ERROR][1] ipam_plugin.go 95: failed to migrate ipam, retrying... error=failed to get add IPIP tunnel addr 192.168.0.1: The provided IP address is not in a configured pool
 node="my.node.fqdn.tld"
[... loops indefinitely ...]

# cat /proc/sys/net/ipv4/conf/all/rp_filter
1

# kubectl --kubeconfig /etc/kubernetes/admin.conf get ippools
No resources found.

System logging is basically full of this, might provide a clue? Doesn’t make sense to me at all.

kubelet[19383]: E0717 15:34:49.353260   19383 plugins.go:746] Error dynamically probing plugins: Error creating Flexvolume plugin from directory nodeagent~uds, skipping. Error: unexpected end of JSON input
kubelet[19383]: W0717 15:34:50.445788   19383 cni.go:213] Unable to update cni config: No networks found in /etc/cni/net.d
kubelet[19383]: E0717 15:34:51.669310   19383 kubelet.go:2169] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
kubelet[19383]: E0717 15:34:53.364334   19383 driver-call.go:267] Failed to unmarshal output for command: init, output: "", error: unexpected end of JSON input
kubelet[19383]: W0717 15:34:53.365060   19383 driver-call.go:150] FlexVolume: driver call failed: executable: /usr/libexec/kubernetes/kubelet-plugins/volume/exec/nodeagent~uds/uds, args: [init], error: fork/exec /usr/libexec/kubernetes/kubelet-plugins/volume/exec/nodeagent~uds/uds: no such file or directory, output: ""

gertvdijk on Dec 11, 2019

same problem here. Fresh install of ubuntu 20.04, kubeadm init and kubectl apply -f https://docs.projectcalico.org/v3.8/manifests/calico.yaml

massih10 on Mar 30, 2021

However, if I delete the 3.5 installation and install 3.6, then kubectl get ippools returns No resources found.

Ah, you might try simply applying the new manifests rather than deleting and then creating. Deleting the old manifests will remove the CRD, thus deleting the IP pool.

I’d expect the v3.7 manifest to create an IP pool as well, but it will only do that after the init containers finish, so that might be what’s going on here.

caseydavenport on Jun 6, 2019