kubernetes: Kubelet sysctl settings cannot be overridden

What happened: The kubelet overrides user defined sysctl parameters silently.

What you expected to happen: Values defined by the administrator in /etc/sysctl.d/*.conf files should take priority or at least not be silently overridden.

How to reproduce it (as minimally and precisely as possible):

Create a file in /etc/sysctl.d that conflicts with one that kubelet sets on startup. - echo "vm.panic_on_oom = 1" > /etc/sysctl.d/30-panic-on-oom.conf is a good example.
Reboot
After boot the value specified through sysctl is overridden by kubelet.

Anything else we need to know?: This behavior is particularly tricky to debug because the kubelet is almost always going to start after the systemd-sysctl service. The end result here is that the system boots and claims to have successfully applied all user defined kernel configuration, but then inspecting manually with the sysctl utility doesn’t show the expected values.

I can understand why the kubelet would want to discover the state of the oom panic toggle on startup, but the changes made for #15091 appear to have completely hard-coded this behavior and actively fight against configuration managed through traditional Linux management options (these parameters would all be valid on the GRUB command line or through systemd setting sysctl parameters on startup).

I can see a few options that would be a reasonable way to handle this:

Offer a configuration option on the kubelet to override the behavior.
Stop managing sysctl parameters that are not fatal to the kubelet running and ship recommended parameters as sysctl.d config files through post-install tooling for the kubelet package in a package manager (deb/rpm) or by letting provisioning tools create those files. Parameters that are considered best-practice but are not required for the kubelet to start can throw a warning.

Environment:

Kubernetes version (use kubectl version): 1.12.4
Cloud provider or hardware configuration: Azure
OS (e.g. from /etc/os-release): Ubuntu 16.04
Kernel (e.g. uname -a): 4.15.0-1037-azure
Install tools: AKS Engine

About this issue

Original URL
State: open
Created 5 years ago
Reactions: 2
Comments: 27 (9 by maintainers)

Most upvoted comments

We are attempting to move to v1.19.7 and ran into the issue where kubelet overwrites vm.panic_on_oom = 1 sysctl setting which is a setting we need enabled.

While there is some crude logic behind how the kernel kills PIDs during an OOM situation it still, in our opinion, leaves the system in an unpredictable state that may not have resolved the real reason the node ran out of memory in the first place. We believe that it would be better to kernel panic, do a core dump for later analysis, and get the node back into a known good state.

Based on the https://github.com/kubernetes/kubernetes/issues/74151#issuecomment-481750191 by @bazzargh, it seems the functionality if changing the vm.panic_on_oom setting may no longer be required.

Moving the behavior to fail by default and warn as mentioned in https://github.com/kubernetes/kubernetes/issues/74151#issuecomment-553529178 I think makes a lot of sense. We could keep the feature where kubelet checks the vm.panic_on_oom and by default fails to start. When protectKernelDefaults: true a warning would be logged.

Examining the code, it seems we could accomplish adding the functionality of just dropping a warning in the log by changing the case when protectKernelDefaults: false with an update of the below line to b := KernelTunableWarn: https://github.com/kubernetes/kubernetes/blob/b11d0fbdd58394a62622787b38e98a620df82750/pkg/kubelet/cm/container_manager_linux.go#L473

@cchildress thoughts?

R37ribution on Mar 24, 2021

Yes, I think that’s their point.

My larger point and a big part of why I wanted to file this issue is I think it is very bad behavior for the kubelet to silently override node configuration that is being explicitly declared by an administrator on the node. I’m perfectly happy to see kubelet check node conditions and refuse to start and/or print a nice warning if you have something configured on the node that conflicts with the kubelet running correctly. Having it revert configuration changes like this makes debugging more challenging and and prevents administrators from tailoring their systems to their needs.

cchildress on Nov 13, 2019

Dup issues: #66693 #90829 #50110

Why the KernelTunableWarn is not configurable? There are some context in https://github.com/kubernetes/kubernetes/pull/27874#discussion_r68314751

If a need arises to support warning, we can consider it then…

It’s time to consider if expose "KernelTunableWarn" in kublet config #50110 is needed. 😄

The workaround is to changing sysctls after kubelet start each time. /kind feature

pacoxu on Aug 16, 2021

/remove-lifecycle stale

cprivitere on Feb 11, 2020