kind: CrashLoopBackOff Error in kube-proxy with kernel versions 5.12.2.arch1-1 and 5.10.35-1-lts

What happened: After creating the cluster with kind create cluster, the kube-proxy pod have a CrashLoopBackOff Error. This happens at the kernel versions 5.12.2.arch1-1 and 5.10.35-1-lts. With kernel versions 5.12.1.arch1-1 and 5.10.34-1-lts I didn’t had the issue.

What you expected to happen: All pods in the cluster should start without problems.

How to reproduce it (as minimally and precisely as possible): On a Arch Linux install with kernel version 5.12.2.arch1-1 or 5.10.35-1-lts with docker installed download the latest version of kind and run kind create cluster.

Anything else we need to know?:

  • Log of kube-proxy pod:
I0511 11:47:28.906526       1 node.go:172] Successfully retrieved node IP: 172.18.0.2                                                                                
I0511 11:47:28.906613       1 server_others.go:142] kube-proxy node IP is an IPv4 address (172.18.0.2), assume IPv4 operation
I0511 11:47:28.953210       1 server_others.go:185] Using iptables Proxier.                                                                                          
I0511 11:47:28.953346       1 server_others.go:192] creating dualStackProxier for iptables.
W0511 11:47:28.960804       1 server_others.go:492] detect-local-mode set to ClusterCIDR, but no IPv6 cluster CIDR defined, , defaulting to no-op detect-local for I
I0511 11:47:28.962804       1 server.go:650] Version: v1.20.2                                                                                                        
I0511 11:47:28.965997       1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_max' to 131072                                                                
F0511 11:47:28.966114       1 server.go:495] open /proc/sys/net/netfilter/nf_conntrack_max: permission denied
  • Events from pod:
Events:                                                                                                                                                              
Type     Reason     Age               From               Message                                                                                                   
----     ------     ----              ----               -------                                                                                                   
Normal   Scheduled  48s               default-scheduler  Successfully assigned kube-system/kube-proxy-s7w5w to kind-control-plane
Normal   Pulled     2s (x4 over 48s)  kubelet            Container image "k8s.gcr.io/kube-proxy:v1.20.2" already present on machine
Normal   Created    2s (x4 over 45s)  kubelet            Created container kube-proxy                                                                              
Normal   Started    2s (x4 over 45s)  kubelet            Started container kube-proxy                                                                              
Warning  BackOff    1s (x5 over 42s)  kubelet            Back-off restarting failed container
  • tried it with iptables and nftables, same result with both.

Enviroment:

  • kind version: (use kind version): Tested both:

    • v0.11.0-alpha+1d4788dd7461b3 go1.16.4
    • v0.10.0 go1.16.4
  • Kubernetes version: (use kubectl version):

Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.0", GitCommit:"cb303e613a121a29364f75cc67d3d580833a7479", GitTreeState:"archive", BuildDate:"2021-04-09T16:47:30Z", GoVersion:"go1.16.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.2", GitCommit:"faecb196815e248d3ecfb03c680a4507229c2a56", GitTreeState:"clean", BuildDate:"2021-03-11T06:23:38Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"linux/amd64"}
  • Docker version: (use docker info):
Client:
 Context:    default
 Debug Mode: false
 Plugins:
  app: Docker App (Docker Inc., v0.9.1-beta3)
  buildx: Build with BuildKit (Docker Inc., v0.5.1-tp-docker)

Server:
 Containers: 12
  Running: 1
  Paused: 0
  Stopped: 11
 Images: 8
 Server Version: 20.10.6
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: false
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: systemd
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 8c906ff108ac28da23f69cc7b74f8e7a470d1df0.m
 runc version: 12644e614e25b05da6fd08a38ffa0cfe1903fdec
 init version: de40ad0
 Security Options:
  seccomp
   Profile: default
  cgroupns
 Kernel Version: 5.10.35-1-lts
 Operating System: Arch Linux
 OSType: linux
 Architecture: x86_64
 CPUs: 4
 Total Memory: 7.666GiB
 Name: avocado
 ID: ZNGF:FTZV:6BK6:VPE3:ZGAR:A5A2:VYEI:LUQE:AEU6:6MHN:ZGTZ:WR2V
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false
  • OS (e.g. from /etc/os-release):
NAME="Arch Linux"
PRETTY_NAME="Arch Linux"
ID=arch
BUILD_ID=rolling

Kernel: 5.10.35-1-lts
CPU: Intel i5-7200U (4) @ 3.100GHz
  • iptables version: v1.8.7 (legacy)
  • nftables version: v0.9.8 (E.D.S.)

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 6
  • Comments: 20 (6 by maintainers)

Commits related to this issue

Most upvoted comments

I’m getting same results with 5.12.2-arch1-1.

Quick workaround if a cluster is needed fast: Manually set the parameter with sudo sysctl net/netfilter/nf_conntrack_max=131072 before creating the Kind cluster.

Thanks for your response Ben.

On further investigation, the old Kind executable is taking precedence in the path on that particular environment. Removing it out showed no issues, cluster is up and running as expected. The Kind 0.11.1 with node images 1.20.7 works without the additional settings

@yharish991 run brew upgrade kind which will upgrade your kind version to 0.11.1 and fix the issue.

For anyone testing stuff with old releases of Kubernetes using kind, you can work around the issue since the fix for the nf_conntrack_max kernel change by using the latest version of kind. As of 9 July 2021, that’s kind v0.11.1. Then, look at the Kubernetes images built for that kind version in the GitHub Release page. For example:

kind create cluster --image=kindest/node:v1.21.1
kind create cluster --image=kindest/node:v1.20.7
kind create cluster --image=kindest/node:v1.19.11
kind create cluster --image=kindest/node:v1.18.19

thanks all, #2241 should be in shortly, and since we’re quite overdue for a release it should be released soon.

how do i fix this issue on mac os?

Man! Thanks! That worked, I should have thought of that myself.

@hyutota @BenTheElder I don’t think this is an Arch Linux-only issue.

According to the changleog of Linux 5.12.2, this commit (torvalds/linux@671c54ea8c7ff47bd88444f3fffb65bf9799ce43) has changed the behaviour of netfilter conntrack. I believe this is the commit that has caused this issue after upgrading to Linux 5.12.2.