kops: Kubelet service breaks after reboot of GCE nodes (permission denied)
- What
kopsversion are you running? The commandkops version, will display this information.
Version 1.8.1
- What Kubernetes version are you running?
kubectl versionwill print the version if a cluster is running or provide the Kubernetes version specified as akopsflag.
1.8.7 and 1.8.8
- What cloud provider are you using?
GCE
- What commands did you run? What is the simplest way to reproduce this issue?
After the initial node boot, we’re able to run /home/kubernetes/bin/kubelet -h and the kubelet service starts up fine, however if the node is rebooted (GCE maintenance or manually running sudo shutdown -r now) the kubelet service fails to start and we’re unable to run it manually (even with sudo), getting permission denied.
systemd[20259]: kubelet.service: Failed at step EXEC spawning /home/kubernetes/bin/kubelet: Permission denied
and
$ sudo /home/kubernetes/bin/kubelet
sudo: unable to execute /home/kubernetes/bin/kubelet: Permission denied
- What happened after the commands executed?
Get permission denied.
- What did you expect to happen?
The kubelet service to start after the node is rebooted
- Please provide your cluster manifest. Execute
kops get --name my.example.com -oyamlto display your cluster manifest. You may want to remove your cluster name and other sensitive information.
n/a
- Please run the commands with most verbose logging by adding the
-v 10flag. Paste the logs into this report, or in a gist and provide the gist link here.
n/a
- Anything else do we need to know?
We also have k8s clusters provisioned via kops in AWS and confirmed we are able to reboot nodes and afterwards they return to Ready state without issue.
Before rebooting the instance has the following mount listed:
$ mount | grep kubernetes/bin
/dev/sda1 on /home/kubernetes/bin type ext4 (rw,nosuid,nodev,relatime,commit=30,data=ordered)
and it does not show up after the reboot. I’m trying to dig in to how Kops / Container Optimized OS manage these but assuming it’s related to something along those lines. If the issue lies with Container Optimized OS, apologies for opening this ticket, but any help you can provide to work around or resolve it would still be much appreicated.
I should mention the file is still “there”, and the permissions show anyone should be able to execute it:
$ ls -alh /home/kubernetes/bin/kubelet
-rwxr-xr-x 1 root root 132M Apr 12 00:38 /home/kubernetes/bin/kubelet
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 27 (1 by maintainers)
I was able to get the kubelet service started and the node back to
Readyafter running:after finding https://github.com/kubernetes/kops/blob/bdf0d04b0aa757fb8bf470fb2667bdf6423a93fe/nodeup/pkg/model/directories.go#L42-L43
Going to try using a kops hook to either create a oneshot systemd unit to run those commands, or to add something to
/etc/fstaband see how that goes.