kubernetes: Kubelet fails to authenticate to apiserver due to expired certificate

/kind bug /sig auth

What happened: My team is having an issue with TLS bootstrap, running Kubernetes 1.10.5. We set --experimental-cluster-signing-duration to 24h on the kube-controller-manager. Some nodes are being deallocated over night, and when they come up, Kubelet goes into a failed state. It appears that it recognizes that the certificate expires and attempts to bootstrap using the token from bootstrap.kubeconfig (so far so good), but then reuses the expired certificate, and cannot authenticate to the apiserver. Here are relevant logs from kubelet:

bootstrap.go:204] Part of the existing bootstrap client certificate is expired: 2018-07-06 12:32:00 +0000 UTC                                                                                                                                                                                              
bootstrap.go:58] Using bootstrap kubeconfig to generate TLS client cert, key and kubeconfig file                                                                                                                                                                                                          
certificate_store.go:117] Loading cert/key pair from "/var/lib/kubelet/pki/kubelet-client-current.pem".                                                                                                                                                                                                    
server.go:549] Starting client certificate rotation.                                                                                                                                                                                                                                                      
certificate_manager.go:216] Certificate rotation is enabled.                                                                                                                                                                                                                                              
certificate_manager.go:287] Rotating certificates                                                                                                                                                                                                                                                          
manager.go:154] cAdvisor running in container: "/sys/fs/cgroup/cpu,cpuacct/system.slice/kubelet.service"                                                                                                                                                                                                  
certificate_manager.go:299] Failed while requesting a signed certificate from the master: cannot create certificate signing request: certificatesigningrequests.certificates.k8s.io is forbidden: User "system:anonymous" cannot create certificatesigningrequests.certificates.k8s.io at the cluster scope

After removing the generated cert at /var/lib/kubelet/pki/kubelet-client-current.pem, kubelet was able to bootstrap properly, obtain a new cert and join the cluster.

rm /var/lib/kubelet/pki/kubelet-client-current.pem
systemctl restart kubelet

Just removing the kubeconfig, or any of the files in /var/lib/kubelet/pki/ other than kubelet-client-current.pem and restarting kubelet did not work. Removing the entire /var/lib/kubelet/pki/ directory and restarting kubelet works as well.

What you expected to happen: I expect that after kubelet recognizes that its certificate has expired, it should remove its certificate and successfully bootstrap with the token in bootstrap.kubeconfig. It should obtain a new, valid, signed certificate from the control plane and successfully authenticate to the apiserver.

How to reproduce it (as minimally and precisely as possible):

  1. Set the RotateKubeletClientCertificate flag on kubelet and feature gate on kube-controller manager
  2. Set the --experimental-cluster-signing-duration flag on the kube-controller-manager to a small duration.
  3. Start kubelet with bootstrap.kubeconfig file containing a token (that is present in the token authentication file passed to the apiserver) – kubelet bootstraps successfully and is Ready.
  4. Stop kubelet before it has attempted to renew its certificate.
  5. Wait until kubelet’s certificate has expired
  6. Restart kubelet

Anything else we need to know?: Let me know if there are more logs and information that would be useful. Thanks a lot!

Environment:

  • Kubernetes version (use kubectl version): 1.10.5
  • Cloud provider or hardware configuration: Azure
  • OS (e.g. from /etc/os-release): CentOS 7.4
  • Kernel (e.g. uname -a): 3.10.0-693.11.6.el7.x86_64
  • Install tools: custom
  • Others:

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Reactions: 4
  • Comments: 24 (14 by maintainers)

Most upvoted comments

I am on baremetal kubernetes version 1.15.3 and below steps helped me solve the issue.

  1. SSH to one of the master node.
  2. create a token using the command kubeadm token create --ttl 24h0m0s
  3. Capture the output(token) on from the step number 2.
  4. SSH to the worker(kubelet) node which is having issues to connect to API server.
  5. Replace the token from the step number 2 in the file /etc/kubernetes/bootstrap-kubelet.conf
  6. Restart the kubelet process using sudo service kubelet restart It will generate the new kubelet.conf and attach the kubelet to the cluster.

@awslovato did you manage to recover your apiserver? if so, can you share how?

I have the same situation, the apiserver won’t run due to the certificate being expired and the certificate cannot be renewed due to the apiserver being down 😦

How is this resolved? I’m still experiencing this exact error on Kubernetes 1.12.7 running on EKS.

server.go:262] failed to run Kubelet: failed to create kubelet: failed to initialize certificate manager: failed to initialize server certificate manager: could not decode the first block from “/var/lib/kubelet/pki/kubelet-server-current.pem” from expected PEM format

Upon checking, the kubelet-server-current.pem points to an empty file. Then kubelet will fail to start.

Workaround :

  1. rm /var/lib/kubelet/pki/kubelet-server-current.pem
  2. systemctl restart kubelet

Surprisingly this only happens on 1 specific Node. All the Nodes are deployed using the same configuration.

certificate_manager.go:299] Failed while requesting a signed certificate from the master: cannot create certificate signing request: certificatesigningrequests.certificates.k8s.io is forbidden: User "system:anonymous" cannot create certificatesigningrequests.certificates.k8s.io at the cluster scope

Anonymous requesting CSRs looks strange?

I can see a very similar thing happening in 1.14.x. I joined a 1.14.x node to a cluster, which completes the TLS bootstrapping process successfully, and stores current, signed certs at /var/lib/kubelet/pki/kubelet-client-current.pem.

I then delete the created /etc/kuberenetes/kubelet.conf, and join the node to a different 1.14.x cluster moments later. The kubelet is failing to complete the TLS bootstrapping process for the new cluster. But by also deleting /var/lib/kubelet/pki/kubelet-client-current.pem before joining the new cluster, the whole process completes as intended.

Perhaps someone with more knowledge on this process can shed some light into the reasoning here. Btw I did not experience this with 1.13.x.

For the original issue, https://github.com/kubernetes/kubernetes/pull/62152/commits/368959346af6e06085c63a4cc7c37839f262f636 was added to resolve that in 1.11

@vistalba if both the normal kubelet client certificate and the bootstrap client credential are expired, going back to your setup method to recreate a credential for the kubelet is the best recommendation

/close

I’m sad, because I thought this was fixed.

/assign