kind: [firewalld] kind doesn't work on Fedora 32

What happened:

After upgrading to Fedora 32, I can no longer create a kind cluster.

What you expected to happen:

My kind cluster to get created

How to reproduce it (as minimally and precisely as possible):

kind create cluster --config=config.yaml

Were config.yaml is…

kind: Cluster
apiVersion: kind.sigs.k8s.io/v1alpha3
networking:
  disableDefaultCNI: True
  podSubnet: "10.254.0.0/16"
  serviceSubnet: "172.30.0.0/16"
nodes:
- role: control-plane
- role: control-plane
- role: control-plane
- role: worker
  extraPortMappings:
  - containerPort: 80
    hostPort: 80
    listenAddress: 0.0.0.0
  - containerPort: 443
    hostPort: 443
    listenAddress: 0.0.0.0
- role: worker
- role: worker

Anything else we need to know?:

Output/trace of running with -v 10 https://gist.github.com/christianh814/abbf1964b9224c8940864d02b9236128

I figured maybe something was stale and ran docker network rm kind and re-ran the command. This time I looked at the logs on my laptop and saw…

May 01 16:51:17 laptop audit[98494]: SERVICE_STOP pid=98494 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:spc_t:s0 msg='unit=kubelet comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'
May 01 16:51:17 laptop audit[98423]: SERVICE_STOP pid=98423 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:spc_t:s0 msg='unit=kubelet comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'

Okay…so I docker exec into one of the workers and saw…

May 01 23:44:19 kind-worker systemd[1]: kubelet.service: Main process exited, code=exited, status=255/EXCEPTION
May 01 23:44:19 kind-worker systemd[1]: kubelet.service: Failed with result 'exit-code'.
May 01 23:44:20 kind-worker systemd[1]: kubelet.service: Service RestartSec=1s expired, scheduling restart.
May 01 23:44:20 kind-worker systemd[1]: kubelet.service: Scheduled restart job, restart counter is at 217.
May 01 23:44:20 kind-worker systemd[1]: Stopped kubelet: The Kubernetes Node Agent.
May 01 23:44:20 kind-worker systemd[1]: Started kubelet: The Kubernetes Node Agent.
May 01 23:44:20 kind-worker kubelet[3469]: Flag --fail-swap-on has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
May 01 23:44:20 kind-worker kubelet[3469]: F0501 23:44:20.360424    3469 server.go:199] failed to load Kubelet config file /var/lib/kubelet/config.yaml, error failed to read kubelet config file "/var/lib/kubelet/config.yaml", error: open /var/lib/kubelet/config.yaml: no such file or directory

And indeed it’s not there

root@kind-worker:/var/lib# ls -1 /var/lib/kubelet/config.yaml
ls: cannot access '/var/lib/kubelet/config.yaml': No such file or directory

Strange that kind create cluster DOES work fine.

Environment:

  • kind version: (use kind version):
$ kind version
kind v0.8.0 go1.14.2 linux/amd64
  • Kubernetes version: (use kubectl version):
$ kubectl version --client
Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.2", GitCommit:"52c56ce7a8272c798dbc29846288d7cd9fbae032", GitTreeState:"clean", BuildDate:"2020-04-16T11:56:40Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
  • Docker version: (use docker info):
$ docker version
Client:
 Version:           19.03.8
 API version:       1.40
 Go version:        go1.14rc1
 Git commit:        afacb8b
 Built:             Mon Mar 16 15:45:37 2020
 OS/Arch:           linux/amd64
 Experimental:      false

Server:
 Engine:
  Version:          19.03.8
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.14rc1
  Git commit:       afacb8b
  Built:            Mon Mar 16 00:00:00 2020
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.3.3
  GitCommit:        
 runc:
  Version:          1.0.0-rc10+dev
  GitCommit:        fbdbaf85ecbc0e077f336c03062710435607dbf1
 docker-init:
  Version:          0.18.0
  GitCommit:        
  • OS (e.g. from /etc/os-release):
$ cat /etc/fedora-release 
Fedora release 32 (Thirty Two)
$  uname -a
Linux laptop 5.6.7-300.fc32.x86_64 #1 SMP Thu Apr 23 14:13:50 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 3
  • Comments: 20 (13 by maintainers)

Most upvoted comments

Update. So on F32, I got it working with Firewalld by changing the FirewallBackend in the /etc/firewalld/firewalld.conf file from nftables to iptables and restarted docker.

# grep 'FirewallBackend=iptables' /etc/firewalld/firewalld.conf 
FirewallBackend=iptables

After I did that, my kind deployments started working “as normal”.

I think short of fully disabling firewalld, you can do:

firewall-cmd --permanent --zone=trusted --add-interface=docker0
firewall-cmd --get-zone-of-interface=<your eth interface>
firewall-cmd --zone=<zone from above> --add-masquerade --permanent
firewall-cmd --reload

So it failed again with the following config…

kind: Cluster
apiVersion: kind.sigs.k8s.io/v1alpha3
networking:
  disableDefaultCNI: True
  podSubnet: "10.254.0.0/16"
  serviceSubnet: "172.30.0.0/16"
nodes:
- role: control-plane
- role: control-plane
- role: control-plane
- role: worker
- role: worker
- role: worker

So I tried a simpler config…

kind: Cluster
apiVersion: kind.sigs.k8s.io/v1alpha3
nodes:
- role: control-plane
- role: worker

So it’s network related. I’ll try 0.8.1 to see.

I’ll also upload the logs

Update. So on F32, I got it working with Firewalld by changing the FirewallBackend in the /etc/firewalld/firewalld.conf file from nftables to iptables and restarted docker.

# grep 'FirewallBackend=iptables' /etc/firewalld/firewalld.conf 
FirewallBackend=iptables

After I did that, my kind deployments started working “as normal”.

Thanks for this note @christianh814 that’s still relevant for Fedora 35/k3s! for others - switch FirewallBackend to iptables, restart firewalld and k3d/k3s. DNS issue’s gone!

workaround and known issue are now documented https://github.com/kubernetes-sigs/kind/pull/1672

v0.8.1 gave me the same result. I believe it’s network related