kubeadm: apiserver fails to start because livenessprobe is too aggressive

[Lubomir] NOTE: possible fix was submitted here: https://github.com/kubernetes/kubernetes/pull/66264

Is this a BUG REPORT or FEATURE REQUEST?

BUG REPORT

Versions

kubeadm version (use kubeadm version): kubeadm version: &version.Info{Major:“1”, Minor:“7”, GitVersion:“v1.7.3+2c2fe6e8278a5”, GitCommit:“2c2fe6e8278a5db2d15a013987b53968c743f2a1”, GitTreeState:“not a git tree”, BuildDate:“1970-01-01T00:00:00Z”, GoVersion:“go1.8”, Compiler:“gc”, Platform:“linux/arm”}

Environment:

  • Kubernetes version (use kubectl version): Server Version: version.Info{Major:“1”, Minor:“7”, GitVersion:“v1.7.4”, GitCommit:“793658f2d7ca7f064d2bdf606519f9fe1229c381”, GitTreeState:“clean”, BuildDate:“2017-08-17T08:30:51Z”, GoVersion:“go1.8.3”, Compiler:“gc”, Platform:“linux/arm”}

  • Cloud provider or hardware configuration: arm32 (bananapi - basically a raspberrypi2)

  • OS (e.g. from /etc/os-release): (my own OS image) ID=“containos” NAME=“containos” VERSION=“v2017.07” VERSION_ID=“v2017.07” PRETTY_NAME=“containos v2017.07”

  • Kernel (e.g. uname -a): Linux master2 4.9.20 #2 SMP Wed Aug 16 15:36:20 AEST 2017 armv7l GNU/Linux

  • Others:

What happened?

kubeadm init sits ~forever at the “waiting for control plane” stage. docker ps/logs investigation shows apiserver is being killed (SIGTERM) and restarted continuously.

What you expected to happen?

Everything to work 😃 In particular, apiserver to come up and the rest of the process to proceed.

How to reproduce it (as minimally and precisely as possible)?

Run kubeadm init on a slow machine.

Anything else we need to know?

For me, during the churn of all those containers starting at once, it takes apiserver about 90s(!) from its first log line to responding to HTTP queries. I haven’t looked in detail at what it’s doing at that point, but the logs mention what looks like etcd bootstrapping things.

My suggested fix is to set apiserver initialDelaySeconds to 180s. And probably similar elsewhere in general - I think there’s very little reason to have aggressive initial delays.

(Unless you’re a unittest that expects to frequently encounter failures, my experience with production software suggests the correct solution to timeouts is almost always to have waited longer).

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Reactions: 2
  • Comments: 75 (40 by maintainers)

Most upvoted comments

@pipejakob I can confirm that (on my bananapi) running this in another terminal at the right point in the kubeadm run makes everything come up successfully:

sed -i 's/initialDelaySeconds: [0-9]\+/initialDelaySeconds: 180/' /etc/kubernetes/manifests/kube-apiserver.yaml

(I usually also manually docker kill the old/restart-looping apiserver container, I’m not sure if that gets cleaned up automatically with static pods)

adding a kubeadm configuration option for this is unlikely at this point.

i’m trying to explain that this is already doable with 3 commands in 1.13:

sudo kubeadm reset -f
sudo kubeadm init phase control-plane all --config=testkubeadm.yaml
sudo sed -i 's/initialDelaySeconds: 15/initialDelaySeconds: 20/g' /etc/kubernetes/manifests/kube-apiserver.yaml
sudo kubeadm init --skip-phases=control-plane --ignore-preflight-errors=all --config=testkubeadm.yaml

I was in the same situation, getting the same error with the approach suggested by @neolit123 . I wasn’t able to run the script by @stephenmoloney , I’m not really familiar with bash scripting, probably my fault.

So I ported the script to python (which is installed by default on Raspbian, so no need for extra dependencies), in case anyone is interested:

import os
import time
import threading

filepath = '/etc/kubernetes/manifests/kube-apiserver.yaml'

def replace_defaults():
    print('Thread start looking for the file')
    while not os.path.isfile(filepath):
        time.sleep(1) #wait one second
    print('\033[94m -----------> FILE FOUND: replacing defaults \033[0m')
    os.system("""sed -i 's/failureThreshold: [0-9]/failureThreshold: 18/g' /etc/kubernetes/manifests/kube-apiserver.yaml""")
    os.system("""sed -i 's/timeoutSeconds: [0-9][0-9]/timeoutSeconds: 20/g' /etc/kubernetes/manifests/kube-apiserver.yaml""")
    os.system("""sed -i 's/initialDelaySeconds: [0-9][0-9]/initialDelaySeconds: 240/g' /etc/kubernetes/manifests/kube-apiserver.yaml""")

t = threading.Thread(target=replace_defaults)
t.start()
os.system("kubeadm init")

To run it: sudo python however_you_name_the_file.py Thank you for pointing me to the solution, @stephenmoloney and @neolit123 !

Hey! I’m also encountering this issue. Interestingly though, I manage to build a cluster master from scratch on my Raspberry 3, but consistenly fail to on my 3+. Anyways, the Version I’m currently using (as per the step-by-step documentation at https://blog.hypriot.com/post/setup-kubernetes-raspberry-pi-cluster/ ) is kubeadm version: &version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.0", GitCommit:"91e7b4fd31fcd3d5f436da26c980becec37ceefe", GitTreeState:"clean", BuildDate:"2018-06-27T20:14:41Z", GoVersion:"go1.10.2", Compiler:"gc", Platform:"linux/arm"}

As with the others, the apiserver container does get up eventually, but not before kubeadm bails out, leaving me in limbo as I’m too unexperienced to manually pick up from there.

Quick Update: running watch -n 1.0 "sed -i 's/initialDelaySeconds: [0-9]\+/initialDelaySeconds: 180/' /etc/kubernetes/manifests/kube-apiserver.yaml" in a separate terminal allowed my cluster to get up.

In case kubeadm is used and the apiserver is started up, we can try to measure points on first startup. Maybe we alter the configuration in a later stage for the timeouts adapted on the measurements on first initialization. Also it is hard to find out, that the apiserver is kicked by the healtz check looking at the logs, we may at least get a better logging in place to be aware of the problem. It took me quite sometime to find out that the livenessprobe was the problem. I have to mention I’m a beginner, and that would at least be helpful to be mentioned somewhere on the failure output for kubeadm.

It’s hard to say, I’d guess approx 1 minute, however I don’t know how to properly measure that.

Additionally, now that my master is operational, I fail adding a node to it with what seems to be another timeout issue. `[preflight] running pre-flight checks [WARNING RequiredIPVSKernelModulesAvailable]: the IPVS proxier will not be used, because the following required kernel modules are not loaded: [ip_vs_rr ip_vs_wrr ip_vs_sh ip_vs] or no builtin kernel ipvs support: map[ip_vs_rr:{} ip_vs_wrr:{} ip_vs_sh:{} nf_conntrack_ipv4:{} ip_vs:{}] you can solve this problem with following methods:

  1. Run 'modprobe – ’ to load missing kernel modules;
  2. Provide the missing builtin kernel ipvs support

I0708 19:02:20.256325 8667 kernel_validator.go:81] Validating kernel version I0708 19:02:20.256846 8667 kernel_validator.go:96] Validating kernel config [WARNING SystemVerification]: docker version is greater than the most recently validated version. Docker version: 18.03.1-ce. Max validated version: 17.03 [discovery] Trying to connect to API Server “192.168.2.2:6443” [discovery] Created cluster-info discovery client, requesting info from “https://192.168.2.2:6443” [discovery] Requesting info from “https://192.168.2.2:6443” again to validate TLS against the pinned public key [discovery] Cluster info signature and contents are valid and TLS certificate validates against pinned roots, will use API Server “192.168.2.2:6443” [discovery] Successfully established connection with API Server “192.168.2.2:6443” [kubelet] Downloading configuration for the kubelet from the “kubelet-config-1.11” ConfigMap in the kube-system namespace [kubelet] Writing kubelet configuration to file “/var/lib/kubelet/config.yaml” [kubelet] Writing kubelet environment file with flags to file “/var/lib/kubelet/kubeadm-flags.env” [preflight] Activating the kubelet service [tlsbootstrap] Waiting for the kubelet to perform the TLS Bootstrap… [kubelet-check] It seems like the kubelet isn’t running or healthy. [kubelet-check] The HTTP call equal to ‘curl -sSL http://localhost:10248/healthz’ failed with error: Get http://localhost:10248/healthz: dial tcp [::1]:10248: connect: connection refused. [kubelet-check] It seems like the kubelet isn’t running or healthy. [kubelet-check] The HTTP call equal to ‘curl -sSL http://localhost:10248/healthz’ failed with error: Get http://localhost:10248/healthz: dial tcp [::1]:10248: connect: connection refused. [kubelet-check] It seems like the kubelet isn’t running or healthy. [kubelet-check] The HTTP call equal to ‘curl -sSL http://localhost:10248/healthz’ failed with error: Get http://localhost:10248/healthz: dial tcp [::1]:10248: connect: connection refused. [kubelet-check] It seems like the kubelet isn’t running or healthy. [kubelet-check] The HTTP call equal to ‘curl -sSL http://localhost:10248/healthz’ failed with error: Get http://localhost:10248/healthz: dial tcp [::1]:10248: connect: connection refused. [kubelet-check] It seems like the kubelet isn’t running or healthy. [kubelet-check] The HTTP call equal to ‘curl -sSL http://localhost:10248/healthz’ failed with error: Get http://localhost:10248/healthz: dial tcp [::1]:10248: connect: connection refused.

Unfortunately, an error has occurred: timed out waiting for the condition

This error is likely caused by: - The kubelet is not running - The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)

If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands: - ‘systemctl status kubelet’ - ‘journalctl -xeu kubelet’ timed out waiting for the condition`

During this time, not a single docker container shows up on my node.

Hi! this issue was of much help

I found a fancy way to resolve this using kustomize

mkdir /tmp/kustom

cat > /tmp/kustom/kustomization.yaml <<EOF
patchesJson6902:
- target:
    version: v1
    kind: Pod
    name: kube-apiserver
    namespace: kube-system
  path: patch.yaml
EOF

cat > /tmp/kustom/patch.yaml <<EOF
- op: replace
  path: /spec/containers/0/livenessProbe/initialDelaySeconds
  value: 30
- op: replace
  path: /spec/containers/0/livenessProbe/timeoutSeconds
  value: 30
EOF

sudo kubeadm init --config config.yaml -k /tmp/kustom

I’m seeing the exact same error as ☝️ above with the following:

modify_kube_apiserver_config(){
  sed -i 's/failureThreshold: [0-9]/failureThreshold: 15/g' /etc/kubernetes/manifests/kube-apiserver.yaml && \
  sed -i 's/timeoutSeconds: [0-9][0-9]/timeoutSeconds: 20/g' /etc/kubernetes/manifests/kube-apiserver.yaml && \
  sed -i 's/initialDelaySeconds: [0-9][0-9]/initialDelaySeconds: 120/g' /etc/kubernetes/manifests/kube-apiserver.yaml
}
kubeadm init phase control-plane all --config=$${KUBEADM_CONFIG_FILE} && \
modify_kube_apiserver_config && \
kubeadm init \
--skip-phases=control-plane \
--ignore-preflight-errors=all \
--config=$${KUBEADM_CONFIG_FILE} \
--v 4

The following script does solve the issue for me using kubeadm versions 1.12, 1.13 (most of the time)

modify_kube_apiserver_config(){
  while [[ ! -e /etc/kubernetes/manifests/kube-apiserver.yaml ]]; do
    sleep 0.5s;
  done && \
  sed -i 's/failureThreshold: [0-9]/failureThreshold: 18/g' /etc/kubernetes/manifests/kube-apiserver.yaml && \
  sed -i 's/timeoutSeconds: [0-9][0-9]/timeoutSeconds: 20/g' /etc/kubernetes/manifests/kube-apiserver.yaml && \
  sed -i 's/initialDelaySeconds: [0-9][0-9]/initialDelaySeconds: 240/g' /etc/kubernetes/manifests/kube-apiserver.yaml
}

# ref https://github.com/kubernetes/kubeadm/issues/413 (initialDelaySeconds is too eager)
if [[ ${var.arch} == "arm" ]]; then modify_kube_apiserver_config & fi

kubeadm init \
  --config=$${KUBEADM_CONFIG_FILE} \
  --v ${var.kubeadm_verbosity}

configuration flag under ClusterConfig->ApiServer that can control the api server timeout.

Searching through the codebase for TimeoutForControlPlane, I think this defaults to 4min, and is only used for the delay used by kubeadm itself to wait for the apiserver to become healthy. In particular, it does not alter the apiserver livenessprobe used by kubelet itself. Is that correct?

I don’t think I’ve seen a counter-argument raised anywhere in the discussion around this issue. Is there a reason we don’t just increase livenessProbe initialDelaySeconds and move on to some other problem?

Aside: As far as I can see from a quick read, TimeoutForControlPlane also doesn’t take into account other non-failure causes for increased apiserver startup delay, like congestion while pulling multiple images, or additional timeout+retry loop iterations while everything is converging at initial install-time (timeout+retry repeatedly is the k8s design pattern … and this happens sometimes on a loaded system, which is expected and just fine). I personally feel like 4minutes is both too long for impatient interactive users expecting a fast failure, and too short for an install process on a loaded/slow/automated system that is prepared to wait longer for expected success. How was this arrived at, can we default to 5mins? Can we keep retrying until SIGINT? Why are we imposing an artificial wall-clock deadline internally rather than inheriting it from the calling environment?

Afaics TimeoutForControlPlane is just exposing an arbitrary fatal internal deadline as a parameter where the only intended UX is just to increase the parameter until the deadline is never reached. Alternatively, we could just not impose that arbitrary fatal internal deadline in the first place…

@joejulian nice I managed to patch that in and now also my cluster is firing up. FINALLY, after weeks of agony! Thank you 😃

This solved the problem on my Raspberry Pi 3 cluster: https://github.com/kubernetes/kubernetes/pull/66264

I actually believe it would make sense to default to “no timeout” with the option of setting a timeout for the whole process (as was suggested somewhere earlier in this issue).

Reason is that most the use cases I can think of actually don’t care if a specific step is executed in X Seconds or not since all that is cared about in a distributed system is eventual consistency (spooling up another node just in case is cheaper than fiddling with the settings).

As an interim solution it would however suffice to actually read the timeout settings for kubeadm join from a configuration file just like the kubeadm init stuff does so that our hack with the inflight timeout replacement works. It’s a hack, don’t think any different - but a terrible hack is still better than no workaround at all.

I’m personally against trying to “guess” sensible timeouts since guesses can always be wrong, would not really serve a purpose in this case (since coping strategy for elapsing timeouts is simply bailing out) and would make reproduction of errors a pain in the ass since two identical systems could start behaving different for a myriad of reasons.

Prepulling images won’t help. The livenessProbe timer doesn’t start until after the image is pulled (as I pointed out above).

The fix is just to extend the initialDelaySeconds timeout(s). The current timeout values in kubeadm are being misused to “enforce” a fast UX experience, and not error detection.

Edit: And to be clear, it’s only the startup that takes ages - my control cluster operates on 3xRPi2 just fine, once I increase the apiserver intialDelaySeconds timeout (and other install-only timeouts used within kubeadm itself). I don’t understand why we’re still talking about this 😕

@anguslees We had the “wait forever” behavior earlier; but that was very sub-optimal from an UX PoV, so now we do have timeouts. We might want to increase some of those timeouts if you want.

How about making them configurable? Does it make sense to have a single option that owns all of them?

@anguslees We had the “wait forever” behavior earlier; but that was very sub-optimal from an UX PoV, so now we do have timeouts. We might want to increase some of those timeouts if you want.

The problem is that usage of kubeadm is two-fold. We both have users typing kubeadm interactively that want to know if something is happening or not and higher-level consumers.