kubernetes: failed to load Kubelet config file /var/lib/kubelet/config.yaml after kubelet update

Is this a BUG REPORT or FEATURE REQUEST?:

Uncomment only one, leave it on its own line:

/kind bug

What happened: /var/lib/kubelet/config.yaml got removed after kubelet update.

What you expected to happen: kubelet update without /var/lib/kubelet/config.yaml being removed.

How to reproduce it (as minimally and precisely as possible): Yum update kubeadm.x86_64 0:1.11.0-0 kubectl.x86_64 0:1.11.0-0 kubelet.x86_64 0:1.11.0-0 Restart kubelet

Anything else we need to know?:

Environment:

Kubernetes version (use kubectl version): 1.11
Cloud provider or hardware configuration: AWS
OS (e.g. from /etc/os-release): CentOS Linux release 7.5.1804 (Core)
Kernel (e.g. uname -a): 3.10.0-862.6.3.el7.x86_64
Install tools:
Others:

About this issue

Original URL
State: closed
Created 6 years ago
Reactions: 16
Comments: 27 (5 by maintainers)

Commits related to this issue

Fixed order of the upgrade process ref: https://github.com/kubernetes/kubernetes/issues/65562 and https://github.com/kubernetes/kubernetes/issues/65863 — committed to notjames/cluster-api-provider-ssh by notjames 6 years ago
Fixed order of the upgrade process ref: https://github.com/kubernetes… (#155) * Fixed order of the upgrade process ref: https://github.com/kubernetes/kubernetes/issues/65562 and https://github.com/ku... — committed to samsung-cnct/cluster-api-provider-ssh by notjames 6 years ago

Most upvoted comments

I pasted my config.yaml from another similar cluster and then followed @WisWang procedure, and my cluster came back to life.

Posting the conf here in case it is useful for somebody else, obviously, it may not match your cluster conf.

address: 0.0.0.0
apiVersion: kubelet.config.k8s.io/v1beta1
authentication:
  anonymous:
    enabled: false
  webhook:
    cacheTTL: 2m0s
    enabled: true
  x509:
    clientCAFile: /etc/kubernetes/pki/ca.crt
authorization:
  mode: Webhook
  webhook:
    cacheAuthorizedTTL: 5m0s
    cacheUnauthorizedTTL: 30s
cgroupDriver: cgroupfs
cgroupsPerQOS: true
clusterDNS:
- 10.96.0.10
clusterDomain: cluster.local
containerLogMaxFiles: 5
containerLogMaxSize: 10Mi
contentType: application/vnd.kubernetes.protobuf
cpuCFSQuota: true
cpuManagerPolicy: none
cpuManagerReconcilePeriod: 10s
enableControllerAttachDetach: true
enableDebuggingHandlers: true
enforceNodeAllocatable:
- pods
eventBurst: 10
eventRecordQPS: 5
evictionHard:
  imagefs.available: 15%
  memory.available: 100Mi
  nodefs.available: 10%
  nodefs.inodesFree: 5%
evictionPressureTransitionPeriod: 5m0s
failSwapOn: true
fileCheckFrequency: 20s
hairpinMode: promiscuous-bridge
healthzBindAddress: 127.0.0.1
healthzPort: 10248
httpCheckFrequency: 20s
imageGCHighThresholdPercent: 85
imageGCLowThresholdPercent: 80
imageMinimumGCAge: 2m0s
iptablesDropBit: 15
iptablesMasqueradeBit: 14
kind: KubeletConfiguration
kubeAPIBurst: 10
kubeAPIQPS: 5
makeIPTablesUtilChains: true
maxOpenFiles: 1000000
maxPods: 110
nodeStatusUpdateFrequency: 10s
oomScoreAdj: -999
podPidsLimit: -1
port: 10250
registryBurst: 10
registryPullQPS: 5
resolvConf: /etc/resolv.conf
rotateCertificates: true
runtimeRequestTimeout: 2m0s
serializeImagePulls: true
staticPodPath: /etc/kubernetes/manifests
streamingConnectionIdleTimeout: 4h0m0s
syncFrequency: 1m0s
volumeStatsAggPeriod: 1m0s

+20

mateobur on Jul 6, 2018

New installs require running kubeadm init

See https://kubernetes.io/docs/setup/independent/create-cluster-kubeadm/#instructions

+14

liggitt on Jul 13, 2018

I upgrade my cluster from 1.10.3 to 1.11.0 with kubeadm. I also met this issue, I use this to generate the kubelet config file kubeadm upgrade node config --kubelet-version v1.11.0, and systemctl daemon-reload&&systemctl restart kubelet, I also changed the cgroupDriver in /var/lib/kubelet/config.yaml , My cluster all cgroupDriver both docker and kubelet is systemd, Now the node looks like ok.

[root@m-30-3 kubelet]# kubectl get node
NAME      STATUS    ROLES     AGE       VERSION
m-30-1    Ready     master    29d       v1.11.0
m-30-2    Ready     <none>    29d       v1.10.3
m-30-3    Ready     <none>    24d       v1.11.0

+13

WisWang on Jul 6, 2018

Hi, THis is more likely bcos the join on the kube node was invoked without the sudo command. Try again on the nodes the kubeadm join with sudo and it should work perfectly fine. Regards Ranga

rangapv on Jan 14, 2019

The kubelet and kubeadm packages should not be upgraded outside of performing the kubeadm upgrade steps for the cluster.

But what about new installs of 1.11.0 (NOT upgrades) getting the same error on a clean CentOS

and what to do in the meantime until 1.12 is released?

Proteles on Jul 13, 2018

Looks like the documentation issue is being addressed. Sweet!

https://github.com/kubernetes/website/pull/9509/files#diff-b4d7b9c51d0dac39e5ad31a3dac5c878R172

apt-mark hold kubelet kubeadm kubectl

Additionally, here’s a way of recovering: https://github.com/kubernetes/kubernetes/issues/65562. (I had to tweak the rollback version to apt install kubelet=1.10.5-00.)

mpareja on Aug 3, 2018

We haven’t called it out explicitly in the past, but we are planning on calling this out explicitly in the Install docs for the 1.12 release. The kubelet and kubeadm packages should not be upgraded outside of performing the kubeadm upgrade steps for the cluster.

https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade-1-11/ has the steps needed for performing a cluster upgrade.

The cause of the issue is a configuration change that was made for the v1.11 release to enable kubelet component config (the /var/lib/kubelet/config.yaml file), as well as adding a systemd environment file to remove the recommendation of modifying systemd unit files for kubelet configuration customization that was required for previous versions.

If there are any additional issues related to the upgrade process, please file them against https://github.com/kubernetes/kubeadm which is used for tracking kubeadm-specific issues.

detiber on Jul 6, 2018