kubernetes: Flannel (NetworkPlugin cni) error: /run/flannel/subnet.env: no such file or directory

/kind bug

@kubernetes/sig-contributor-experience-bugs

What happened: Installed a single-node kubernetes cluster on centos 7 (VM running on virtual box); my application pod (created via k8s deployment) won’t go into Ready state

Pod Event: Warning FailedCreatePodSandBox . . . Kubelet . . . Failed create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox . . . network for pod “companyemployees-deployment-766c7c7767-t7mc5”: NetworkPlugin cni failed to set up pod “companyemployees-deployment-766c7c7767-t7mc5_default” network: open /run/flannel/subnet.env: no such file or directory

In addition, it looks like the kubernetes coredns docker container keeps exiting – e.g. docker ps -a | grep -i coredns: 6341ce0be652 k8s.gcr.io/pause:3.1 “/pause” . . . Exited (0) 1 second ago k8s_POD_coredns-576cbf47c7-9bxxg_kube-system_e84afb7a-d7b7-11e8-bafa-08002745c4bc_581

What you expected to happen: Flannel not to have the error & Pod to go into ready state

How to reproduce it (as minimally and precisely as possible): Create a simple deployment after creating docker image and pushing the image to a private docker registry kubectl create -f companyemployees-deployment.yaml deployment yaml:

apiVersion: apps/v1beta1
kind: Deployment
metadata:
  name: companyemployees-deployment
  labels:
    app: companyemployees
spec:
  replicas: 1
  selector:
    matchLabels:
      app: companyemployees
  template:
    metadata:
      labels:
        app: companyemployees
    spec:
      containers:
      - name: companyemployees
        image: localhost:5000/companyemployees:1.0
        ports:
        - containerPort: 9092

Anything else we need to know?: ip link 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT qlen 1 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT qlen 1000 link/ether 08:00:27:45:c4:bc brd ff:ff:ff:ff:ff:ff 3: enp0s8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT qlen 1000 link/ether 08:00:27:21:0f:92 brd ff:ff:ff:ff:ff:ff 4: docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT link/ether 02:42:1b:04:1f:7c brd ff:ff:ff:ff:ff:ff 6: veth3f5bcb4@if5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UP mode DEFAULT link/ether b2:1f:d4:fb:84:2e brd ff:ff:ff:ff:ff:ff link-netnsid 0 7: flannel.1: <BROADCAST,MULTICAST> mtu 1450 qdisc noop state DOWN mode DEFAULT link/ether e6:44:ed:15:dd:97 brd ff:ff:ff:ff:ff:ff

Environment:

  • Kubernetes version (use kubectl version): Client Version: version.Info{Major:“1”, Minor:“12”, GitVersion:“v1.12.1”, GitCommit:“4ed3216f3ec431b140b1d899130a69fc671678f4”, GitTreeState:“clean”, BuildDate:“2018-10-05T16:46:06Z”, GoVersion:“go1.10.4”, Compiler:“gc”, Platform:“linux/amd64”} Server Version: version.Info{Major:“1”, Minor:“12”, GitVersion:“v1.12.1”, GitCommit:“4ed3216f3ec431b140b1d899130a69fc671678f4”, GitTreeState:“clean”, BuildDate:“2018-10-05T16:36:14Z”, GoVersion:“go1.10.4”, Compiler:“gc”, Platform:“linux/amd64”}

  • Cloud provider or hardware configuration: Single-node kubernetes cluster on CentOS 7 VM running on virtual box (virtual box is running on windows 7 pro)

  • OS (e.g. from /etc/os-release): cat /etc/os-release: NAME=“CentOS Linux” VERSION=“7 (Core)” ID=“centos” ID_LIKE=“rhel fedora” VERSION_ID=“7” PRETTY_NAME=“CentOS Linux 7 (Core)” ANSI_COLOR=“0;31” CPE_NAME=“cpe:/o:centos:centos:7” HOME_URL=“https://www.centos.org/” BUG_REPORT_URL=“https://bugs.centos.org/

CENTOS_MANTISBT_PROJECT=“CentOS-7” CENTOS_MANTISBT_PROJECT_VERSION=“7” REDHAT_SUPPORT_PRODUCT=“centos” REDHAT_SUPPORT_PRODUCT_VERSION=“7”

rpm -q centos-release centos-release-7-4.1708.el7.centos.x86_64

  • Kernel (e.g. uname -a): uname -a Linux ibm-ms 3.10.0-693.5.2.el7.x86_64 #1 SMP Fri Oct 20 20:32:50 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

  • Install tools:

My team’s centos image had docker, kubernetes, flannel and docker private registry already on the image; it was working and then recently I had issues w/ it that resulted in my uninstalling kubernetes, docker and flannel and reinstalling.

Install steps:

Switch to root: su - root

install docker

  1. yum install -y yum-utils device-mapper-persistent-data lvm2
  2. yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
  3. yum install docker-ce
  4. systemctl daemon-reload
  5. systemctl enable docker
  6. systemctl start docker
  7. docker run hello-world

install private docker registry

  1. docker pull registry
  2. docker run -d -p 5000:5000 --restart=always --name registry registry
  3. Note: firewalld is not running

install k8s:

  1. setenforce 0
  2. sed -i --follow-symlinks 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/sysconfig/selinux
  3. swapoff -a
  4. Edit /etc/fstab and comment-out /dev/mapper/centos-swap swap
  5. Add kubernetes repo for yum - edit /etc/yum.repos.d/kubernetes.repo and add
[kubernetes]
	name=Kubernetes
	baseurl=https://packages.cloud.google.com/yum/repos/kubernetes-el7-x86_64
	enabled=1
	gpgcheck=1
	repo_gpgcheck=1
	gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg
		https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg
  1. yum install -y kubelet kubeadm kubectl
  2. systemctl enable kubelet
  3. systemctl start kubelet
  4. kubeadm init --pod-network-cidr= 10.244.0.0/16
  5. k8s config for user – running as root: export KUBECONFIG=/etc/kubernetes/admin.conf

install flannel:

  1. sysctl net.bridge.bridge-nf-call-iptables=1
  2. kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/bc79dd1505b0c8681ece4de4c0d86c5cd2643275/Documentation/kube-flannel.yml

Remove master node taint (to allow scheduling pods on master): kubectl taint nodes --all node-role.kubernetes.io/master-

  • Others: Prior to installing, uninstalled using following steps:

Switch to root: su - root

Uninstall k8s (Although on master node, I did this a few times and included draining the node the last time)

  1. kubectl drain mynodename --delete-local-data --force --ignore-daemonsets
  2. kubectl delete node mynodename
  3. kubeadm reset
  4. systemctl stop kubelet
  5. yum remove kubeadm kubectl kubelet kubernetes-cni kube*
  6. yum autoremove
  7. rm -rf ~/.kube
  8. rm -rf /var/lib/kubelet/*

Uninstall docker:

  1. docker rm docker ps -a -q``
  2. docker stop (as needed)
  3. docker rmi -f docker images -q``
  4. Check that all containers and images were deleted: docker ps -a; docker images
  5. systemctl stop docker
  6. yum remove yum-utils device-mapper-persistent-data lvm2
  7. yum remove docker docker-client docker-client-latest docker-common docker-latest docker-latest-logrotate docker-logrotate docker-selinux docker-engine-selinux docker-engine
  8. yum remove docker-ce
  9. rm -rf /var/lib/docker 12. rm -rf /etc/docker

Uninstall flannel

  1. rm -rf /var/lib/cni/
  2. rm -rf /run/flannel
  3. rm -rf /etc/cni/
  4. Remove interfaces related to docker and flannel: ip link For each interface for docker or flannel, do the following ifconfig <name of interface from ip link> down ip link delete <name of interface from ip link>

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Reactions: 2
  • Comments: 24 (3 by maintainers)

Most upvoted comments

Just got the same problem - fixed it by manually adding the file:

/run/flannel/subnet.env

FLANNEL_NETWORK=10.244.0.0/16
FLANNEL_SUBNET=10.244.0.1/24
FLANNEL_MTU=1450
FLANNEL_IPMASQ=true

I know this is old but I wanted to comment here as I too had this issue, but in my case it was a symptom to a different issue. In my case, there was no subnet.env file but it was not getting created because my flannel daemonset was failing. The error from the pod (kubectl --namespace=kube-system logs <POD_NAME>) showed “Error registering network: failed to acquire lease: node “<NODE_NAME>” pod cidr not assigned”. The node was missing a spec for podCIDR, so I ran “kubectl patch node <NODE_NAME> -p ‘{“spec”:{“podCIDR”:“10.244.0.0/16”}}’” for each node and the issue went away.

in my case, using centos in DO , the file /run/flannel/subnet.env exist, but same issue: /run/flannel/subnet.env: no such file or directory

at first I tried different subnet while running kubeadm init --pod-network-cidr=192.168.255.0/24

I tried @discostur solution, with changing the file manually, but the subnet.env restored to its original state when I restarted the master

this only solved by kubeadm reset and use flannel default network-cidr kubeadm init --pod-network-cidr=10.244.0.0/16

creating the /run/flannel/subnet.env fixes the coredns issue not starting but it’s only temporary. My solution for the master/control-plane :

  1. kubeadm init --control-plane-endpoint=whatever --node-name whatever --pod-network-cidr=10.244.0.0/16
  2. kubectl apply -f https://raw.githubusercontent.com/flannel-io/flannel/master/Documentation/kube-flannel.yml
  3. restart all
systemctl stop kubelet
systemctl stop docker
iptables --flush
iptables -tnat --flush
systemctl start kubelet
systemctl start docker

The subnet.env file is written out by the flannel daemonset pods and probably shouldn’t be modified by hand.

If that file isn’t getting written, it suggests another problem preventing the flannel pod from starting up. Are there other logs in the flannel pod? You can check with something like kubectl logs -n kube-system <flannel-pod-name>

Happy to continue discussing, but I’m going to close this since it appears to be a flannel issue rather than a Kubernetes one. Might also be worth raising as a support issue against the flannel repo too: https://github.com/coreos/flannel

/remove-triage unresolved /remove-kind bug /close

Thanks, I just needed a quick solution for a test system running some old k8s. I scripted the workaround which recreates the missing /run/flannel/subnet.env:

#! /bin/bash

set -x


# See https://github.com/kubernetes/kubernetes/issues/70202
# Run as root (e.g. with sudo)

mkdir -p /run/flannel

cat << EOF > /run/flannel/subnet.env
FLANNEL_NETWORK=10.244.0.0/16
FLANNEL_SUBNET=10.244.0.1/24
FLANNEL_MTU=1450
FLANNEL_IPMASQ=true
EOF

I also encountered exactly same problem while creating rook-ceph-operator pod, enforcing SELinux to 0 on worker nodes resolved the issue.

Just got the same problem - fixed it by manually adding the file: /run/flannel/subnet.env

FLANNEL_NETWORK=10.244.0.0/16
FLANNEL_SUBNET=10.244.0.1/24
FLANNEL_MTU=1450
FLANNEL_IPMASQ=true

this solution worked for me. but i’ve one doubt. What are the means of these values and how flannel is using these values?

This will get it started, but it won’t survive a reboot…still struggling with this myself

Just got the same problem - fixed it by manually adding the file:

/run/flannel/subnet.env

FLANNEL_NETWORK=10.244.0.0/16
FLANNEL_SUBNET=10.244.0.1/24
FLANNEL_MTU=1450
FLANNEL_IPMASQ=true

Thanks this worked for us