kubernetes: Flannel (NetworkPlugin cni) error: /run/flannel/subnet.env: no such file or directory
/kind bug
@kubernetes/sig-contributor-experience-bugs
What happened: Installed a single-node kubernetes cluster on centos 7 (VM running on virtual box); my application pod (created via k8s deployment) won’t go into Ready state
Pod Event: Warning FailedCreatePodSandBox . . . Kubelet . . . Failed create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox . . . network for pod “companyemployees-deployment-766c7c7767-t7mc5”: NetworkPlugin cni failed to set up pod “companyemployees-deployment-766c7c7767-t7mc5_default” network: open /run/flannel/subnet.env: no such file or directory
In addition, it looks like the kubernetes coredns docker container keeps exiting – e.g. docker ps -a | grep -i coredns: 6341ce0be652 k8s.gcr.io/pause:3.1 “/pause” . . . Exited (0) 1 second ago k8s_POD_coredns-576cbf47c7-9bxxg_kube-system_e84afb7a-d7b7-11e8-bafa-08002745c4bc_581
What you expected to happen: Flannel not to have the error & Pod to go into ready state
How to reproduce it (as minimally and precisely as possible):
Create a simple deployment after creating docker image and pushing the image to a private docker registry
kubectl create -f companyemployees-deployment.yaml
deployment yaml:
apiVersion: apps/v1beta1
kind: Deployment
metadata:
name: companyemployees-deployment
labels:
app: companyemployees
spec:
replicas: 1
selector:
matchLabels:
app: companyemployees
template:
metadata:
labels:
app: companyemployees
spec:
containers:
- name: companyemployees
image: localhost:5000/companyemployees:1.0
ports:
- containerPort: 9092
Anything else we need to know?:
ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT qlen 1000
link/ether 08:00:27:45:c4:bc brd ff:ff:ff:ff:ff:ff
3: enp0s8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT qlen 1000
link/ether 08:00:27:21:0f:92 brd ff:ff:ff:ff:ff:ff
4: docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT
link/ether 02:42:1b:04:1f:7c brd ff:ff:ff:ff:ff:ff
6: veth3f5bcb4@if5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UP mode DEFAULT
link/ether b2:1f:d4:fb:84:2e brd ff:ff:ff:ff:ff:ff link-netnsid 0
7: flannel.1: <BROADCAST,MULTICAST> mtu 1450 qdisc noop state DOWN mode DEFAULT
link/ether e6:44:ed:15:dd:97 brd ff:ff:ff:ff:ff:ff
Environment:
-
Kubernetes version (use
kubectl version): Client Version: version.Info{Major:“1”, Minor:“12”, GitVersion:“v1.12.1”, GitCommit:“4ed3216f3ec431b140b1d899130a69fc671678f4”, GitTreeState:“clean”, BuildDate:“2018-10-05T16:46:06Z”, GoVersion:“go1.10.4”, Compiler:“gc”, Platform:“linux/amd64”} Server Version: version.Info{Major:“1”, Minor:“12”, GitVersion:“v1.12.1”, GitCommit:“4ed3216f3ec431b140b1d899130a69fc671678f4”, GitTreeState:“clean”, BuildDate:“2018-10-05T16:36:14Z”, GoVersion:“go1.10.4”, Compiler:“gc”, Platform:“linux/amd64”} -
Cloud provider or hardware configuration: Single-node kubernetes cluster on CentOS 7 VM running on virtual box (virtual box is running on windows 7 pro)
-
OS (e.g. from /etc/os-release):
cat /etc/os-release: NAME=“CentOS Linux” VERSION=“7 (Core)” ID=“centos” ID_LIKE=“rhel fedora” VERSION_ID=“7” PRETTY_NAME=“CentOS Linux 7 (Core)” ANSI_COLOR=“0;31” CPE_NAME=“cpe:/o:centos:centos:7” HOME_URL=“https://www.centos.org/” BUG_REPORT_URL=“https://bugs.centos.org/”
CENTOS_MANTISBT_PROJECT=“CentOS-7” CENTOS_MANTISBT_PROJECT_VERSION=“7” REDHAT_SUPPORT_PRODUCT=“centos” REDHAT_SUPPORT_PRODUCT_VERSION=“7”
rpm -q centos-release
centos-release-7-4.1708.el7.centos.x86_64
-
Kernel (e.g.
uname -a):uname -aLinux ibm-ms 3.10.0-693.5.2.el7.x86_64 #1 SMP Fri Oct 20 20:32:50 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux -
Install tools:
My team’s centos image had docker, kubernetes, flannel and docker private registry already on the image; it was working and then recently I had issues w/ it that resulted in my uninstalling kubernetes, docker and flannel and reinstalling.
Install steps:
Switch to root: su - root
install docker
yum install -y yum-utils device-mapper-persistent-data lvm2yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repoyum install docker-cesystemctl daemon-reloadsystemctl enable dockersystemctl start dockerdocker run hello-world
install private docker registry
docker pull registrydocker run -d -p 5000:5000 --restart=always --name registry registry- Note: firewalld is not running
install k8s:
setenforce 0sed -i --follow-symlinks 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/sysconfig/selinuxswapoff -a- Edit
/etc/fstaband comment-out /dev/mapper/centos-swap swap - Add kubernetes repo for yum - edit /etc/yum.repos.d/kubernetes.repo and add
[kubernetes]
name=Kubernetes
baseurl=https://packages.cloud.google.com/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg
https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg
yum install -y kubelet kubeadm kubectlsystemctl enable kubeletsystemctl start kubeletkubeadm init --pod-network-cidr= 10.244.0.0/16- k8s config for user – running as root:
export KUBECONFIG=/etc/kubernetes/admin.conf
install flannel:
sysctl net.bridge.bridge-nf-call-iptables=1kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/bc79dd1505b0c8681ece4de4c0d86c5cd2643275/Documentation/kube-flannel.yml
Remove master node taint (to allow scheduling pods on master): kubectl taint nodes --all node-role.kubernetes.io/master-
- Others: Prior to installing, uninstalled using following steps:
Switch to root: su - root
Uninstall k8s (Although on master node, I did this a few times and included draining the node the last time)
kubectl drain mynodename --delete-local-data --force --ignore-daemonsetskubectl delete node mynodenamekubeadm resetsystemctl stop kubeletyum remove kubeadm kubectl kubelet kubernetes-cni kube*yum autoremoverm -rf ~/.kuberm -rf /var/lib/kubelet/*
Uninstall docker:
docker rmdocker ps -a -q``docker stop (as needed)docker rmi -fdocker images -q``- Check that all containers and images were deleted:
docker ps -a;docker images systemctl stop dockeryum remove yum-utils device-mapper-persistent-data lvm2yum remove docker docker-client docker-client-latest docker-common docker-latest docker-latest-logrotate docker-logrotate docker-selinux docker-engine-selinux docker-engineyum remove docker-cerm -rf /var/lib/docker12.rm -rf /etc/docker
Uninstall flannel
rm -rf /var/lib/cni/rm -rf /run/flannelrm -rf /etc/cni/- Remove interfaces related to docker and flannel:
ip linkFor each interface for docker or flannel, do the followingifconfig <name of interface from ip link> downip link delete <name of interface from ip link>
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Reactions: 2
- Comments: 24 (3 by maintainers)
Just got the same problem - fixed it by manually adding the file:
/run/flannel/subnet.env
I know this is old but I wanted to comment here as I too had this issue, but in my case it was a symptom to a different issue. In my case, there was no subnet.env file but it was not getting created because my flannel daemonset was failing. The error from the pod (kubectl --namespace=kube-system logs <POD_NAME>) showed “Error registering network: failed to acquire lease: node “<NODE_NAME>” pod cidr not assigned”. The node was missing a spec for podCIDR, so I ran “kubectl patch node <NODE_NAME> -p ‘{“spec”:{“podCIDR”:“10.244.0.0/16”}}’” for each node and the issue went away.
in my case, using centos in DO , the file /run/flannel/subnet.env exist, but same issue:
/run/flannel/subnet.env: no such file or directoryat first I tried different subnet while running
kubeadm init --pod-network-cidr=192.168.255.0/24I tried @discostur solution, with changing the file manually, but the subnet.env restored to its original state when I restarted the master
this only solved by
kubeadm resetand use flannel default network-cidrkubeadm init --pod-network-cidr=10.244.0.0/16creating the /run/flannel/subnet.env fixes the coredns issue not starting but it’s only temporary. My solution for the master/control-plane :
kubeadm init --control-plane-endpoint=whatever --node-name whatever --pod-network-cidr=10.244.0.0/16kubectl apply -f https://raw.githubusercontent.com/flannel-io/flannel/master/Documentation/kube-flannel.ymlThe
subnet.envfile is written out by the flannel daemonset pods and probably shouldn’t be modified by hand.If that file isn’t getting written, it suggests another problem preventing the flannel pod from starting up. Are there other logs in the flannel pod? You can check with something like
kubectl logs -n kube-system <flannel-pod-name>Happy to continue discussing, but I’m going to close this since it appears to be a flannel issue rather than a Kubernetes one. Might also be worth raising as a support issue against the flannel repo too: https://github.com/coreos/flannel
/remove-triage unresolved /remove-kind bug /close
Thanks, I just needed a quick solution for a test system running some old k8s. I scripted the workaround which recreates the missing /run/flannel/subnet.env:
I also encountered exactly same problem while creating rook-ceph-operator pod, enforcing SELinux to 0 on worker nodes resolved the issue.
This will get it started, but it won’t survive a reboot…still struggling with this myself
Thanks this worked for us