kubernetes: Kubelet fails to authenticate to apiserver due to expired certificate
/kind bug /sig auth
What happened: My team is having an issue with TLS bootstrap, running Kubernetes 1.10.5. We set --experimental-cluster-signing-duration to 24h on the kube-controller-manager. Some nodes are being deallocated over night, and when they come up, Kubelet goes into a failed state. It appears that it recognizes that the certificate expires and attempts to bootstrap using the token from bootstrap.kubeconfig (so far so good), but then reuses the expired certificate, and cannot authenticate to the apiserver. Here are relevant logs from kubelet:
bootstrap.go:204] Part of the existing bootstrap client certificate is expired: 2018-07-06 12:32:00 +0000 UTC
bootstrap.go:58] Using bootstrap kubeconfig to generate TLS client cert, key and kubeconfig file
certificate_store.go:117] Loading cert/key pair from "/var/lib/kubelet/pki/kubelet-client-current.pem".
server.go:549] Starting client certificate rotation.
certificate_manager.go:216] Certificate rotation is enabled.
certificate_manager.go:287] Rotating certificates
manager.go:154] cAdvisor running in container: "/sys/fs/cgroup/cpu,cpuacct/system.slice/kubelet.service"
certificate_manager.go:299] Failed while requesting a signed certificate from the master: cannot create certificate signing request: certificatesigningrequests.certificates.k8s.io is forbidden: User "system:anonymous" cannot create certificatesigningrequests.certificates.k8s.io at the cluster scope
After removing the generated cert at /var/lib/kubelet/pki/kubelet-client-current.pem
, kubelet was able to bootstrap properly, obtain a new cert and join the cluster.
rm /var/lib/kubelet/pki/kubelet-client-current.pem
systemctl restart kubelet
Just removing the kubeconfig, or any of the files in /var/lib/kubelet/pki/
other than kubelet-client-current.pem and restarting kubelet did not work. Removing the entire /var/lib/kubelet/pki/
directory and restarting kubelet works as well.
What you expected to happen: I expect that after kubelet recognizes that its certificate has expired, it should remove its certificate and successfully bootstrap with the token in bootstrap.kubeconfig. It should obtain a new, valid, signed certificate from the control plane and successfully authenticate to the apiserver.
How to reproduce it (as minimally and precisely as possible):
- Set the RotateKubeletClientCertificate flag on kubelet and feature gate on kube-controller manager
- Set the
--experimental-cluster-signing-duration
flag on the kube-controller-manager to a small duration. - Start kubelet with bootstrap.kubeconfig file containing a token (that is present in the token authentication file passed to the apiserver) – kubelet bootstraps successfully and is Ready.
- Stop kubelet before it has attempted to renew its certificate.
- Wait until kubelet’s certificate has expired
- Restart kubelet
Anything else we need to know?: Let me know if there are more logs and information that would be useful. Thanks a lot!
Environment:
- Kubernetes version (use
kubectl version
): 1.10.5 - Cloud provider or hardware configuration: Azure
- OS (e.g. from /etc/os-release): CentOS 7.4
- Kernel (e.g.
uname -a
): 3.10.0-693.11.6.el7.x86_64 - Install tools: custom
- Others:
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Reactions: 4
- Comments: 24 (14 by maintainers)
I am on baremetal kubernetes version 1.15.3 and below steps helped me solve the issue.
kubeadm token create --ttl 24h0m0s
/etc/kubernetes/bootstrap-kubelet.conf
sudo service kubelet restart
It will generate the new kubelet.conf and attach the kubelet to the cluster.@awslovato did you manage to recover your apiserver? if so, can you share how?
I have the same situation, the apiserver won’t run due to the certificate being expired and the certificate cannot be renewed due to the apiserver being down 😦
How is this resolved? I’m still experiencing this exact error on Kubernetes 1.12.7 running on EKS.
Upon checking, the kubelet-server-current.pem points to an empty file. Then kubelet will fail to start.
Workaround :
Surprisingly this only happens on 1 specific Node. All the Nodes are deployed using the same configuration.
Anonymous requesting CSRs looks strange?
I can see a very similar thing happening in 1.14.x. I joined a 1.14.x node to a cluster, which completes the TLS bootstrapping process successfully, and stores current, signed certs at /var/lib/kubelet/pki/kubelet-client-current.pem.
I then delete the created /etc/kuberenetes/kubelet.conf, and join the node to a different 1.14.x cluster moments later. The kubelet is failing to complete the TLS bootstrapping process for the new cluster. But by also deleting /var/lib/kubelet/pki/kubelet-client-current.pem before joining the new cluster, the whole process completes as intended.
Perhaps someone with more knowledge on this process can shed some light into the reasoning here. Btw I did not experience this with 1.13.x.
For the original issue, https://github.com/kubernetes/kubernetes/pull/62152/commits/368959346af6e06085c63a4cc7c37839f262f636 was added to resolve that in 1.11
@vistalba if both the normal kubelet client certificate and the bootstrap client credential are expired, going back to your setup method to recreate a credential for the kubelet is the best recommendation
/close
I’m sad, because I thought this was fixed.
/assign