calico: stat /var/lib/calico/nodename: no such file or directory problem,please help.

Hi, here is a problem in my kubernetes cluster, in the node wx3, I want to create a static pod named jenkins, but kubelet make error log over and over.

E0322 15:59:06.016063 1239 kuberuntime_gc.go:152] Failed to stop sandbox "420698bd9963f65496a5fd0c127f2b23497d678ddcf58362aa35615d8739d372" before removing: rpc error: code = Unknown desc = NetworkPlugin cni failed to teardown pod "jenkins-wx3_default" network: stat /var/lib/calico/nodename: no such file or directory: check that the calico/node container is running and has mounted /var/lib/calico/ W0322 15:59:14.922384 1239 helpers.go:847] eviction manager: no observation found for eviction signal allocatableNodeFs.available I0322 15:59:17.649057 1239 kuberuntime_manager.go:389] No ready sandbox for pod "jenkins-wx3_default(1d947eff714cafbfcc78ef0291db3291)" can be found. Need to start a new one W0322 15:59:17.651466 1239 cni.go:265] CNI failed to retrieve network namespace path: Cannot find network namespace for the terminated container "aaf3954dc74a610b5da9cfbbcf67d413b64ee49f00d5df0835fb7f340449181b" E0322 15:59:17.756783 1239 cni.go:319] Error deleting network: stat /var/lib/calico/nodename: no such file or directory: check that the calico/node container is running and has mounted /var/lib/calico/ E0322 15:59:17.757482 1239 remote_runtime.go:115] StopPodSandbox "aaf3954dc74a610b5da9cfbbcf67d413b64ee49f00d5df0835fb7f340449181b" from runtime service failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to teardown pod "jenkins-wx3_default" network: stat /var/lib/calico/nodename: no such file or directory: check that the calico/node container is running and has mounted /var/lib/calico/ E0322 15:59:17.757520 1239 kuberuntime_manager.go:781] Failed to stop sandbox {"docker" "aaf3954dc74a610b5da9cfbbcf67d413b64ee49f00d5df0835fb7f340449181b"} E0322 15:59:17.757568 1239 kuberuntime_manager.go:581] killPodWithSyncResult failed: failed to "KillPodSandbox" for "1d947eff714cafbfcc78ef0291db3291" with KillPodSandboxError: "rpc error: code = Unknown desc = NetworkPlugin cni failed to teardown pod \"jenkins-wx3_default\" network: stat /var/lib/calico/nodename: no such file or directory: check that the calico/node container is running and has mounted /var/lib/calico/" E0322 15:59:17.757597 1239 pod_workers.go:182] Error syncing pod 1d947eff714cafbfcc78ef0291db3291 ("jenkins-wx3_default(1d947eff714cafbfcc78ef0291db3291)"), skipping: failed to "KillPodSandbox" for "1d947eff714cafbfcc78ef0291db3291" with KillPodSandboxError: "rpc error: code = Unknown desc = NetworkPlugin cni failed to teardown pod \"jenkins-wx3_default\" network: stat /var/lib/calico/nodename: no such file or directory: check that the calico/node container is running and has mounted /var/lib/calico/"

when I put the jenkins.yml to wx1, everything ok. how can I fix it ?

Your Environment

~ # calicoctl version Client Version: v2.0.1 Build date: 2018-02-23T23:37:37+0000 Git commit: 5fa93655 Cluster Version: v3.0.1-218-gb3b47737 Cluster Type: k8s,bgp

~ # calicoctl get node -o wide NAME ASN IPV4 IPV6
wx (unknown) 192.168.21.55/24
wx1 (unknown) 192.168.21.56/24
wx3 (unknown) 192.168.21.11/24

~ # calicoctl get workloadEndpoint -o wide NAME WORKLOAD NODE NETWORKS INTERFACE PROFILES NATS
wx-k8s-dnsmasq–dep–844fb9f48d–wr4qp-eth0 dnsmasq-dep-844fb9f48d-wr4qp wx 172.50.56.6/32 cali3aeaee8bcfc kns.default
wx-k8s-nfsd–555cf7c46b–9q9q9-eth0 nfsd-555cf7c46b-9q9q9 wx 172.50.56.61/32 calie9a5b3f1744 kns.default
wx-k8s-nginx–deployment–77c45bd648–xb2r5-eth0 nginx-deployment-77c45bd648-xb2r5 wx 172.50.56.60/32 cali44402d20873 kns.default
wx-k8s-spark–master-eth0 spark-master wx 172.50.56.63/32 cali54d44e2d0ac kns.default
wx-k8s-spark–slave1-eth0 spark-slave1 wx 172.50.56.2/32 cali9a2eec147dd kns.default
wx-k8s-spark–slave2-eth0 spark-slave2 wx 172.50.56.1/32 cali80f72bad764 kns.default
wx-k8s-spark–slave3-eth0 spark-slave3 wx 172.50.56.5/32 caliac3052224a9 kns.default
wx-k8s-tomcat7–dep–74bf5b7d88–smq2n-eth0 tomcat7-dep-74bf5b7d88-smq2n wx 172.50.56.62/32 cali6c038e3b06b kns.default
wx-k8s-zk3–wx-eth0 zk3-wx wx 172.50.56.7/32 cali8f4bab72ef5 kns.default
wx1-k8s-busybox-eth0 busybox wx1 172.50.255.150/32 cali12d4a061371 kns.default
wx1-k8s-dnsmasq–dep–77bb7f589f–vzbb5-eth0 dnsmasq-dep-77bb7f589f-vzbb5 wx1 172.50.255.169/32 cali1c838e89bdd kns.default
wx1-k8s-hadoop–client-eth0 hadoop-client wx1 172.50.255.152/32 calid54dec8afc4 kns.default
wx1-k8s-hadoop–httpfs–8f757b8cc–qh8zm-eth0 hadoop-httpfs-8f757b8cc-qh8zm wx1 172.50.255.167/32 cali6994c0f1574 kns.default
wx1-k8s-hadoop–httpfs–8f757b8cc–rdt6c-eth0 hadoop-httpfs-8f757b8cc-rdt6c wx1 172.50.255.146/32 cali95554e22362 kns.default
wx1-k8s-nginx–deployment–77c45bd648–n598x-eth0 nginx-deployment-77c45bd648-n598x wx1 172.50.255.153/32 cali16e6132bd14 kns.default
wx1-k8s-nginx–deployment–77c45bd648–zv786-eth0 nginx-deployment-77c45bd648-zv786 wx1 172.50.255.159/32 calid24d442f2ea kns.default
wx1-k8s-tomcat7–dep–74bf5b7d88–4hpfr-eth0 tomcat7-dep-74bf5b7d88-4hpfr wx1 172.50.255.163/32 calib89ca8a389d kns.default
wx1-k8s-tomcat7–dep–74bf5b7d88–8sbjb-eth0 tomcat7-dep-74bf5b7d88-8sbjb wx1 172.50.255.149/32 cali98af15efd2b kns.default
wx1-k8s-tomcat7–dep–74bf5b7d88–9htnx-eth0 tomcat7-dep-74bf5b7d88-9htnx wx1 172.50.255.151/32 cali893197594b5 kns.default
wx1-k8s-tomcat7–dep–74bf5b7d88–qcn9f-eth0 tomcat7-dep-74bf5b7d88-qcn9f wx1 172.50.255.162/32 cali93dfdd66d35 kns.default
wx1-k8s-zk2–wx1-eth0 zk2-wx1 wx1 172.50.255.157/32 cali36493d30616 kns.default

ubuntu@ubuntu1:~$ sudo kubectl describe po jenkins-wx3 Name: jenkins-wx3 Namespace: default Node: wx3/192.168.21.11 Start Time: Thu, 22 Mar 2018 15:42:03 +0800 Labels: app=jenkins Annotations: kubernetes.io/config.hash=1d947eff714cafbfcc78ef0291db3291 kubernetes.io/config.mirror=1d947eff714cafbfcc78ef0291db3291 kubernetes.io/config.seen=2018-03-22T15:42:03.107778114+08:00 kubernetes.io/config.source=file Status: Pending IP:
Containers: jenkins: Container ID:
Image: jenkins:alpine Image ID:
Ports: 8080/TCP, 50000/TCP State: Waiting Reason: ContainerCreating Ready: False Restart Count: 0 Environment: <none> Mounts: <none> Conditions: Type Status Initialized True Ready False PodScheduled True Volumes: <none> QoS Class: BestEffort Node-Selectors: <none> Tolerations: :NoExecute Events: <none>

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Reactions: 1
  • Comments: 16 (9 by maintainers)

Commits related to this issue

Most upvoted comments

@r7vme

Facing same issue.

My calico.yml file is https://docs.projectcalico.org/v3.5/getting-started/kubernetes/installation/hosted/calico.yaml

Error

Failed create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container “b577ddbdd5fbd6cbe79e5b1bf20648e981590ecd0df545a0158ce909d9179096” network for pod “frontend-784f75ddb7-nbz7t”: NetworkPlugin cni failed to set up pod “frontend-784f75ddb7-nbz7t_default” network: stat /var/lib/calico/nodename: no such file or directory: check that the calico/node container is running and has mounted /var/lib/calico/

kubectl get pods --all-namespaces

NAMESPACE     NAME                                       READY   STATUS              RESTARTS   AGE
default       frontend-784f75ddb7-nbz7t                  0/1     ContainerCreating   0          91m
default       redis-master-97979696c-hcgdm               0/1     ContainerCreating   0          91m
default       redis-slave-6fd879d46c-klp4r               0/1     ContainerCreating   0          91m
default       ripple-app-dashboard-58d49bb867-wj44k      0/1     ContainerCreating   0          110m
kube-system   calico-etcd-b7wqf                          1/1     Running             0          143m
kube-system   calico-kube-controllers-74887d7bdf-wxhkd   1/1     Running             0          144m
kube-system   calico-node-58fqj                          1/1     Running             0          144m
kube-system   calico-node-mchcc                          0/1     CrashLoopBackOff    25         100m
kube-system   coredns-86c58d9df4-7ncdk                   1/1     Running             0          158m
kube-system   coredns-86c58d9df4-g4jcp                   1/1     Running             0          158m
kube-system   etcd-kmaster                               1/1     Running             0          157m
kube-system   kube-apiserver-kmaster                     1/1     Running             0          157m
kube-system   kube-controller-manager-kmaster            1/1     Running             0          157m
kube-system   kube-proxy-njx5c                           1/1     Running             0          137m
kube-system   kube-proxy-pkxx5                           1/1     Running             0          158m
kube-system   kube-scheduler-kmaster                     1/1     Running             0          157m
kube-system   kubernetes-dashboard-57df4db6b-zcvcc       1/1     Running             0          141m


Kubercates version v1.13

Could you explain why that is needed in your case?

We run kubelet in docker container, so i need to provide access to /var/lib/calico host path. It isn’t easy not from config change perspective, but from perspective of releasing two dependant changes. I need to make sure all our customers updated to release with mount, before i can apply new calico. All doable, but nodename_file_optional makes it possible to release new calico in single step. We already discussed changes and it’s completely safe procedure, because nodename will be fetched by calling hostname only when master already upgraded (applied new calico manifest), but worker still not. When worker will be rolled out with kubelet change (mount /var/lib/calico) CNI immediately will start using /var/lib/calico/nodename file. In total it’s about 1 hour from our experience. Bam! 😃

@ggaurav10 do you see the /var/lib/calico/nodename file on the host filesystem?

Also, are you running a containerized kubelet by chance? If so, you’ll also need to mount that directory into the kubelet container so that the CNI plugin can see it.

what 's the difference between master and latest?

Just to clarify this - master is the latest build of the code from the master branch, and isn’t guaranteed to be stable.

latest points to the latest stable release.

I’d still recommend pinning to a specific release to avoid pulling in unexpected changes.

Whoever will be struggling with the same error and it’s not always quick to upgrade kubelet config (add /var/lib/calico mount) on all clusters. There is a compatibility mode if calico nodename == hostname.

Add to configmap

"nodename_file_optional": true,

So final cni_network_config looks like that:

            cni_network_config: |-
              {    
                "name": "k8s-pod-network",
                "cniVersion": "0.3.0",
                "plugins": [
                  {    
                    "type": "calico",
                    "log_level": "info",
                    "etcd_endpoints": "__ETCD_ENDPOINTS__",
                    "etcd_key_file": "__ETCD_KEY_FILE__",
                    "etcd_cert_file": "__ETCD_CERT_FILE__",
                    "etcd_ca_cert_file": "__ETCD_CA_CERT_FILE__",
                    "mtu": __CNI_MTU__,
                    "nodename_file_optional": true,
                    "ipam": {
                        "type": "calico-ipam"
                    },   
                    "policy": {
                        "type": "k8s"
                    },   
                    "kubernetes": {
                        "kubeconfig": "__KUBECONFIG_FILEPATH__"
                    }    
                  },   
                  {    
                    "type": "portmap",
                    "snat": true,
                    "capabilities": {"portMappings": true}
                  }    
                ]    
              } 

In this case, for nodes w/o /var/lib/calico in kubelet CNI plugin will use hostname, for nodes with mount it will use /var/lib/calico/nodename file.

thanks for the response. yes. i can see the file on the host, and yes, the kubelet is running in a container. Mounting the directory in the kubelet container solved the issue. 😃

Thanks again.