cluster-api: Race condition in cloud init

What steps did you take and what happened: [A clear and concise description on how to REPRODUCE the bug.] (Doesn’t repro very often) Creation of master node fails with error

Cloud-init v. 18.3-52-gc5f78957-1~bddeb~18.04.1 running 'modules:config' at Fri, 01 Nov 2019 18:58:08 +0000. Up 119.99 seconds.
[init] Using Kubernetes version: v1.14.6
[preflight] Running pre-flight checks
error execution phase preflight: [preflight] Some fatal errors occurred:
        [ERROR CRI]: container runtime is not running: output: NAME:
   crictl info - Display information of the container runtime
USAGE:
   crictl info [command options] [arguments...]
OPTIONS:
   --output value, -o value  Output format, One of: json|yaml (default: "json")
time="2019-11-01T18:59:13Z" level=fatal msg="failed to connect: failed to connect: context deadline exceeded"
, error: exit status 1
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
Cloud-init v. 18.3-52-gc5f78957-1~bddeb~18.04.1 running 'modules:final' at Fri, 01 Nov 2019 18:58:34 +0000. Up 145.84 seconds.
2019-11-01 18:59:14,128 - util.py[WARNING]: Failed running /var/lib/cloud/instance/scripts/runcmd [1]
2019-11-01 18:59:14,185 - cc_scripts_user.py[WARNING]: Failed to run module scripts-user (scripts in /var/lib/cloud/instance/scripts)
2019-11-01 18:59:14,186 - util.py[WARNING]: Running module scripts-user (<module 'cloudinit.config.cc_scripts_user' from '/usr/lib/python3/dist-packages/cloudinit/config/cc_scripts_user.py'>) failed

What did you expect to happen: Master node to come up

Anything else you would like to add: This is the containerd log for journalctl -u containerd -l

"Failed to load cni during init, please check CRI plugin status before setting up network for pods" error="cni config load failed: no network config found in /etc/cni/net.d: cni plugin not initialized: failed to load cni config"

Environment:

  • Cluster-api version: v1alph2
  • Minikube/KIND version:
  • Kubernetes version: (use kubectl version):
  • OS (e.g. from /etc/os-release):

/kind bug

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 22 (21 by maintainers)

Most upvoted comments

Hi @figo,

Based on the conversation in #112, I wanted to revise the drop-in example from above. We should have two drop-ins:

/etc/systemd/system/containerd.d/cloud-init.conf

[Unit]
# Start containerd as late as possible with respect to cloud-init's
# boot stages. This ensures that cloud-init modules such as
# write-files may be used to configure containerd prior to it
# starting.
#
# Please see the following link for more information about the
# cloud-init boot stage managed by the cloud-config.service:
# https://cloudinit.readthedocs.io/en/latest/topics/boot.html#config
After=cloud-config.service
Wants=cloud-config.service

# Ensure containerd is started before the cloud-init boot stage
# that executes the runcmd module. This module is responsible
# for running "kubeadm init", and thus containerd must be
# started before this command is processed.
#
# Please see the following link for more information about the
# cloud-init boot stage managed by the cloud-final.service:
# https://cloudinit.readthedocs.io/en/latest/topics/boot.html#final
Before=cloud-final.service
WantedBy=cloud-final.service

/etc/systemd/system/containerd.d/memory-pressure.conf

[Service]
# Decreases the likelihood that containerd is killed due to memory
# pressure.
#
# Please see the following link for more information about the
# OOMScoreAdjust configuration property:
# https://www.freedesktop.org/software/systemd/man/systemd.exec.html#OOMScoreAdjust=
OOMScoreAdjust=-999

cc @detiber @dims

Hi @figo / @codenrhoden,

I think one of you should take this in the image builder. I believe a drop-in for a cloud-init phase target should be able to set up the required service order to prevent the race condition.

You don’t need to do that @figo. Just use Ansible to write a drop in file for containerd. A drop in can add/replace settings for an existing systemd unit file. Read more at https://www.freedesktop.org/software/systemd/man/systemd.unit.html and https://wiki.archlinux.org/index.php/systemd#Drop-in_files.

For example:

/etc/systemd/system/containerd.d/cloudinit.conf

[Unit]
After=cloud-config.service
Wants=cloud-config.service

Before=cloud-final.service
WantedBy=cloud-final.service

@figo I can foresee cases where someone might want to inject config during bootstrapping, which is what I was referring to. Being able to specify a custom config file for containerd in the bootstrapping config, for example. I would expect that by adding a systemd dropin with the criteria I mentioned about should help support that use case (and we are enforcing the ordering in the image, but allowing user-data defined config for the service).

It might be helpful to define a similar dropin for the kubelet as well, that would also add an After/Wants for containerd.