kubernetes: Kubelet 1.17.8 breaks pod/container startup

What happened:

There was a happy kubernetes cluster running 1.17.6 version of kubelet and everything. Once upgraded to 1.17.8: kubelet 1.17.8 breaks pod/container startup.

What you expected to happen:

Pods created and started successfully.

How to reproduce it (as minimally and precisely as possible):

No pods with cluster IP can be started. They all go to crashloop. They seem to start, but kubelet kills them right away.

Anything else we need to know?:

Kubelet keeps retrying and failing with the following error log:

Jun 30 18:31:49 lvdkbw503 kubelet-wrapper[2236]: W0701 01:31:49.911786    2236 docker_sandbox.go:394] failed to read pod IP from plugin/docker: networkPlugin cni failed on the status hook for pod "rook-ceph-osd-2-66d588bc98-khxnk_rook-ceph": unexpected command output nsenter: failed to execute ip: No such file or directory
Jun 30 18:31:49 lvdkbw503 kubelet-wrapper[2236]:  with error: exit status 127

Rolling back kubelet to 1.17.6 immediately fixes the issue.

Environment:

  • Kubernetes version (use kubectl version):
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.0", GitCommit:"9e991415386e4cf155a24b1da15becaa390438d8", GitTreeState:"clean", BuildDate:"2020-03-25T14:58:59Z", GoVersion:"go1.13.8", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.8", GitCommit:"35dc4cdc26cfcb6614059c4c6e836e5f0dc61dee", GitTreeState:"clean", BuildDate:"2020-06-26T03:36:03Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
  • Cloud provider or hardware configuration:

Self-hosted kubernetes cluster in a VMWare cluster.

  • OS (e.g: cat /etc/os-release):
NAME="Flatcar Container Linux by Kinvolk"
ID=flatcar
ID_LIKE=coreos
VERSION=2512.2.1
VERSION_ID=2512.2.1
BUILD_ID=2020-06-16-1044
PRETTY_NAME="Flatcar Container Linux by Kinvolk 2512.2.1 (Oklo)"
ANSI_COLOR="38;5;75"
HOME_URL="https://flatcar-linux.org/"
BUG_REPORT_URL="https://issues.flatcar-linux.org"
FLATCAR_BOARD="amd64-usr"
  • Kernel (e.g. uname -a):
Linux lvdkbw503 4.19.128-flatcar #1 SMP Tue Jun 16 10:17:17 -00 2020 x86_64 Intel(R) Xeon(R) CPU E5-2660 v2 @ 2.20GHz GenuineIntel GNU/Linux
  • Install tools:
ignition config
  • Network plugin and version (if this is a network-related bug):
Calico v3.13.3
  • Others:
Docker version 18.06.3-ce, build d7080c1

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 15 (10 by maintainers)

Most upvoted comments

@rikatz haha, let’s praise @justaugustus then 😃

@knight42 and thank you for your help triaging this 😃

Hey @igcherkaev just a complement from @knight42: are you using hyperkube? Or the kubelet binary directly?

I’ll try to check what commits might have changed from 1.17.6 to 1.17.8

@knight42 thanks for tackling this issue! About Flatcar, it’s ‘hard’ to say which package is the ‘ip’ because Flatcar uses the Container OS model (it’s the successor of CoreOS and similar to the behavior of Fedora CoreOS and so on). Usually you don’t install packages in it, you map them through another container, etc 😃