kubeadm: Regression - etcd datadir permissions not set on etcd grow
Is this a BUG REPORT or FEATURE REQUEST?
BUG REPORT
Versions
kubeadm version (use kubeadm version
): 1.14+
Environment: N/A
What happened?
With the release of etcd 3.4.10, the datadir permissions now need to be 0700 or etcd won’t start. There was an issue (#1308) where perms were set on join before starting the etcd container as a security control, overriding the default behavior of creating a non-existant directory mode 0755. However, in a cleanup, that necessary os.mkdirall was removed. This was transparently ignored for several releases since etcd didn’t complain, but with etcd-io/etcd#11798 (in 3.4.10), the new etcd cluster on the second node does not start.
I’m pretty sure this will break anyone on k8s 1.14 or newer who upgrades to etcd 3.4.10 or newer without first fixing the /var/lib/etcd perms.
What you expected to happen?
/var/lib/etcd (or whatever the var is set to) should be set to 0700. 😃
How to reproduce it (as minimally and precisely as possible)?
Join a second master node, then ls -ld /var/lib/etcd
on the node. With an etcd 3.4.10 or newer runtime
Anything else we need to know?
It’s worth explicitly noting that the first control plane node added works fine. It’s just the second and subsequent nodes which were handled in a separate location in the code which exhibit the problem.
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 26 (18 by maintainers)
I discussed with @spzala about this. We are thinking about providing a warning message instead of enforcing the file permission: https://github.com/etcd-io/etcd/pull/12242
updated the PR to only create the directory if it does not exist on init/join-control-plane, but not chmod it. https://github.com/kubernetes/kubernetes/pull/94102
/remove-priority important-soon /priority backlog lowering priority since the fix in etcd was applied and 1.20 will include 3.4.13+. (there is also discussion to backport the etcd version to 1.19)
actually, I think @jingyih is working on resolving the regression in etcd … not sure if kubeadm should work around it or wait for an etcd fix. I think kubernetes manifests will stay on 3.4.9 until this is resolved
ok, in the above PR i’ve added a chmod 700 in kubeadm init even if the directory already exists, just in case.
@neolit123 We are currently using kubeadm / k8s 1.18.8. We use an external etcd cluster which we provisioned ourselves on dedicated localstorage vm’s. We added the etcd cluster to kubeadm via the kubeadm-config configmap. I was able to upgrade the etcd cluster after changing the directory permissions of
/var/lib/etcd
to 700
ok, so originally a similar fix was added here in the function that creates the static pods for 1.14-pre: https://github.com/kubernetes/kubernetes/commit/836f413cf1096c9b020b20319d0767aee4f9b990#diff-c4574f3918f016aeb3b32f5d9cb62ed6
later that code was moved (by ereslibre): https://github.com/kubernetes/kubernetes/commit/981bf1930c73a7d95bbbd1dc9b3bfff122ad09a8
and then this refactor that you linked indeed omitted it. https://github.com/kubernetes/kubernetes/pull/73452 that PR was very noisy and had 140+ comments and we might have missed this.
so yes, we should not let the kubelet create the path with 755, and include the following:
at this line: https://github.com/kubernetes/kubernetes/blob/master/cmd/kubeadm/app/cmd/phases/join/controlplanejoin.go#L133
thanks for reporting it.