rancher: Rancher 2.5 (Single Install) not starting after nf_conntrack_max value adjustment

What kind of request is this (question/bug/enhancement/feature request): bug

Steps to reproduce (least amount of steps as possible):

In this scenario, the Docker Host where the Rancher container is started is a LXC container.

root@ubuntu4:~# sysctl -a|grep conntrack_max
net.netfilter.nf_conntrack_max = 524288

root@ubuntu4:~# docker run -d --name rancher --restart=unless-stopped -p 80:80 -p 443:443 --privileged rancher/rancher:v2.5.8

Installation and setup works fine.

Now the nf_conntrack_max value is set to another value on the LXC host:

root@host ~ # sysctl -w net.netfilter.nf_conntrack_max=525000
net.netfilter.nf_conntrack_max = 525000

Inside the LXC container, the nested Rancher container is restarted:

root@ubuntu4:~# docker restart rancher

root@ubuntu4:~# docker logs --follow rancher
[...]
I0624 07:38:23.051838      54 node.go:136] Successfully retrieved node IP: 172.17.0.2
I0624 07:38:23.051867      54 server_others.go:143] kube-proxy node IP is an IPv4 address (172.17.0.2), assume IPv4 operation
I0624 07:38:23.052887      54 server_others.go:186] Using iptables Proxier.
I0624 07:38:23.053288      54 server.go:650] Version: v1.19.8+k3s1
I0624 07:38:23.053960      54 conntrack.go:103] Set sysctl 'net/netfilter/nf_conntrack_max' to 524288
F0624 07:38:23.053999      54 server.go:495] open /proc/sys/net/netfilter/nf_conntrack_max: permission denied
2021/06/24 07:38:23 [FATAL] k3s exited with: exit status 1

Result: Start of Rancher container fails. At least if the current value is same or higher than the remembered value.

Expected Result: Start of Rancher container succeeds.

Other details that may be helpful:

The problem seems to be in the func realConntracker in k3s/conntrack.go: https://github.com/k3s-io/k3s/blob/master/vendor/k8s.io/kubernetes/cmd/kube-proxy/app/conntrack.go#L98

This function checks the current sysctl value and compares it to a (somewhere internally retained?) internal value:

func (realConntracker) setIntSysCtl(name string, value int) error {
	entry := "net/netfilter/" + name

	sys := sysctl.New()
	if val, _ := sys.GetSysctl(entry); val != value {
		klog.Infof("Set sysctl '%v' to %v", entry, value)
		if err := sys.SetSysctl(entry, value); err != nil {
			return err
		}
	}
	return nil
}

If the current system value doesn’t match the internal value, sys.SetSysctl is launched to set the previously known value. Yet this fails in a LXC container as the value is defined at Host level:

root@ubuntu4:~# sysctl -w net/netfilter/nf_conntrack_max=524288
root@ubuntu4:~# sysctl -a|grep conntrack_max
net.netfilter.nf_conntrack_max = 525000

Probably related: https://github.com/kubernetes-sigs/kind/pull/2241

Environment information

  • Rancher version (rancher/rancher/rancher/server image tag or shown bottom left in the UI): 2.5.8

  • Installation option (single install/HA): Single Install

  • Docker version (use docker version):

root@ubuntu4:~# docker version
Client:
 Version:           20.10.2
 API version:       1.41
 Go version:        go1.13.8
 Git commit:        20.10.2-0ubuntu1~20.04.2
 Built:             Tue Mar 30 21:24:57 2021
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server:
 Engine:
  Version:          20.10.2
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.13.8
  Git commit:       20.10.2-0ubuntu1~20.04.2
  Built:            Mon Mar 29 19:10:09 2021
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.4.4-0ubuntu1~20.04.2
  GitCommit:        
 runc:
  Version:          1.0.0~rc95-0ubuntu1~20.04.1
  GitCommit:        
 docker-init:
  Version:          0.19.0
  GitCommit:        

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 1
  • Comments: 22 (7 by maintainers)

Commits related to this issue

Most upvoted comments

Note for anyone else for awareness: Based on some quick checks I did, this issue affects ALL versions of Rancher 2.5 prior to 2.5.10 as each uses an affected version of K3s. Even older versions of Rancher that used an older version of K3s are affected 2.5.0 thru 2.5.9.

v2.5.10 and v2.6.0 are not affected (they are fixed as per the PRs for each). Testing with latest 2.4 (2.4.17 as of the time I’m writing this) also worked for me.

Running echo "131072" > /proc/sys/net/netfilter/nf_conntrack_max on the node may workaround this issue (Thank you @slickwarren for originally proposing this workaround).

@Napsty , https://github.com/rancher/rancher/issues/33762 is backport of this issue that targets milestone 2.5.10.

@lindhe , at this time rancher:latest tag is the same as v2.5.9 release and it doesn’t have the fix for this issue. This will be fixed in the upcoming v2.6.0 release. It is also planned to be backported to next v2.5.x release (v2.5.10) as per separate backport issue https://github.com/rancher/rancher/issues/33762. Please also note our current process is to close issues once they are validated by QA in internal/RC builds as per this comment - https://github.com/rancher/rancher/issues/33360#issuecomment-892777075.

Hope this helps.

Hi @nickgerace No, only keep the issue open until the Rancher 2.5.x release with the fix (for Single Install) is out . As the issue title mentions and what the issue initially was filed for, was concerning the 2.5.x Single Install (on Docker).

The k8s upstream fix is a wider issue, as mentioned by @briandowns in https://github.com/kubernetes/kubernetes/pull/103174#issuecomment-890100905. Not sure if it’s helpful to keep this Rancher issue open to track the k8s upstream. Doesn’t seem to be moving really there (unfortunately).

I’m open for other suggestions of course.

@snasovich Thanks, yes this helps. So will keep my hopes up for 2.5.10 😃

@Napsty It has been fixed in k3s version: v1.21.3+k3s1 Related PR - https://github.com/k3s-io/k3s/pull/3341 and this version is embedded in the rancher version that was tested on .