rancher: Rancher 2.5 (Single Install) not starting after nf_conntrack_max value adjustment
What kind of request is this (question/bug/enhancement/feature request): bug
Steps to reproduce (least amount of steps as possible):
In this scenario, the Docker Host where the Rancher container is started is a LXC container.
root@ubuntu4:~# sysctl -a|grep conntrack_max
net.netfilter.nf_conntrack_max = 524288
root@ubuntu4:~# docker run -d --name rancher --restart=unless-stopped -p 80:80 -p 443:443 --privileged rancher/rancher:v2.5.8
Installation and setup works fine.
Now the nf_conntrack_max value is set to another value on the LXC host:
root@host ~ # sysctl -w net.netfilter.nf_conntrack_max=525000
net.netfilter.nf_conntrack_max = 525000
Inside the LXC container, the nested Rancher container is restarted:
root@ubuntu4:~# docker restart rancher
root@ubuntu4:~# docker logs --follow rancher
[...]
I0624 07:38:23.051838 54 node.go:136] Successfully retrieved node IP: 172.17.0.2
I0624 07:38:23.051867 54 server_others.go:143] kube-proxy node IP is an IPv4 address (172.17.0.2), assume IPv4 operation
I0624 07:38:23.052887 54 server_others.go:186] Using iptables Proxier.
I0624 07:38:23.053288 54 server.go:650] Version: v1.19.8+k3s1
I0624 07:38:23.053960 54 conntrack.go:103] Set sysctl 'net/netfilter/nf_conntrack_max' to 524288
F0624 07:38:23.053999 54 server.go:495] open /proc/sys/net/netfilter/nf_conntrack_max: permission denied
2021/06/24 07:38:23 [FATAL] k3s exited with: exit status 1
Result: Start of Rancher container fails. At least if the current value is same or higher than the remembered value.
Expected Result: Start of Rancher container succeeds.
Other details that may be helpful:
The problem seems to be in the func realConntracker
in k3s/conntrack.go:
https://github.com/k3s-io/k3s/blob/master/vendor/k8s.io/kubernetes/cmd/kube-proxy/app/conntrack.go#L98
This function checks the current sysctl value and compares it to a (somewhere internally retained?) internal value:
func (realConntracker) setIntSysCtl(name string, value int) error {
entry := "net/netfilter/" + name
sys := sysctl.New()
if val, _ := sys.GetSysctl(entry); val != value {
klog.Infof("Set sysctl '%v' to %v", entry, value)
if err := sys.SetSysctl(entry, value); err != nil {
return err
}
}
return nil
}
If the current system value doesn’t match the internal value, sys.SetSysctl is launched to set the previously known value. Yet this fails in a LXC container as the value is defined at Host level:
root@ubuntu4:~# sysctl -w net/netfilter/nf_conntrack_max=524288
root@ubuntu4:~# sysctl -a|grep conntrack_max
net.netfilter.nf_conntrack_max = 525000
Probably related: https://github.com/kubernetes-sigs/kind/pull/2241
Environment information
-
Rancher version (
rancher/rancher
/rancher/server
image tag or shown bottom left in the UI): 2.5.8 -
Installation option (single install/HA): Single Install
-
Docker version (use
docker version
):
root@ubuntu4:~# docker version
Client:
Version: 20.10.2
API version: 1.41
Go version: go1.13.8
Git commit: 20.10.2-0ubuntu1~20.04.2
Built: Tue Mar 30 21:24:57 2021
OS/Arch: linux/amd64
Context: default
Experimental: true
Server:
Engine:
Version: 20.10.2
API version: 1.41 (minimum version 1.12)
Go version: go1.13.8
Git commit: 20.10.2-0ubuntu1~20.04.2
Built: Mon Mar 29 19:10:09 2021
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.4.4-0ubuntu1~20.04.2
GitCommit:
runc:
Version: 1.0.0~rc95-0ubuntu1~20.04.1
GitCommit:
docker-init:
Version: 0.19.0
GitCommit:
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Reactions: 1
- Comments: 22 (7 by maintainers)
Commits related to this issue
- With this commit kube-proxy accepts current system values (retrieved by sysctl) which are higher than the internally known and expected values. This should solve Rancher issue https://github.com/ranc... — committed to Napsty/k3s by Napsty 3 years ago
- Allow higher system values With this commit kube-proxy accepts current system values (retrieved by sysctl) which are higher than the internally known and expected values. This should solve the Ranch... — committed to Napsty/k3s by Napsty 3 years ago
- Do not attempt to overwrite higher system (sysctl) values With this commit kube-proxy accepts current system values (retrieved by sysctl) which are higher than the internally known and expected value... — committed to Napsty/kubernetes by Napsty 3 years ago
- Do not attempt to overwrite higher system (sysctl) values With this commit kube-proxy accepts current system values (retrieved by sysctl) which are higher than the internally known and expected value... — committed to ibabou/kubernetes by Napsty 3 years ago
Note for anyone else for awareness: Based on some quick checks I did, this issue affects ALL versions of Rancher 2.5 prior to 2.5.10 as each uses an affected version of K3s. Even older versions of Rancher that used an older version of K3s are affected 2.5.0 thru 2.5.9.
v2.5.10 and v2.6.0 are not affected (they are fixed as per the PRs for each). Testing with latest 2.4 (2.4.17 as of the time I’m writing this) also worked for me.
Running
echo "131072" > /proc/sys/net/netfilter/nf_conntrack_max
on the node may workaround this issue (Thank you @slickwarren for originally proposing this workaround).@Napsty , https://github.com/rancher/rancher/issues/33762 is backport of this issue that targets milestone 2.5.10.
@lindhe , at this time
rancher:latest
tag is the same as v2.5.9 release and it doesn’t have the fix for this issue. This will be fixed in the upcoming v2.6.0 release. It is also planned to be backported to next v2.5.x release (v2.5.10) as per separate backport issue https://github.com/rancher/rancher/issues/33762. Please also note our current process is to close issues once they are validated by QA in internal/RC builds as per this comment - https://github.com/rancher/rancher/issues/33360#issuecomment-892777075.Hope this helps.
Hi @nickgerace No, only keep the issue open until the Rancher 2.5.x release with the fix (for Single Install) is out . As the issue title mentions and what the issue initially was filed for, was concerning the 2.5.x Single Install (on Docker).
The k8s upstream fix is a wider issue, as mentioned by @briandowns in https://github.com/kubernetes/kubernetes/pull/103174#issuecomment-890100905. Not sure if it’s helpful to keep this Rancher issue open to track the k8s upstream. Doesn’t seem to be moving really there (unfortunately).
I’m open for other suggestions of course.
@snasovich Thanks, yes this helps. So will keep my hopes up for 2.5.10 😃
@Napsty It has been fixed in k3s version:
v1.21.3+k3s1
Related PR - https://github.com/k3s-io/k3s/pull/3341 and this version is embedded in the rancher version that was tested on .