talos: Cilium agents fail to start due to mount permissions with Cilium v1.12.0 (likely upstream issue)
Bug Report
Description
I created a new cluster without CNI by adding --config-patch '[{"op": "add", "path": "/cluster/proxy", "value": {"disabled": true}}, {"op":"add", "path": "/cluster/network/cni", "value": {"name": "none"}}]'
to talosctl gen config
.
After running talosctl bootstrap
, deploying Cilium with Helm using
helm install cilium cilium/cilium --namespace kube-system --set ipam.mode=kubernetes --set kubeProxyReplacement=strict --set k8sServiceHost="master1.lan" --set k8sServicePort="6443"
results in Cilium initialization never completing. While the operators start up, all workers end up in CrashLoopBackOff
trying to run the command
sh
-ec
cp /usr/bin/cilium-mount /hostbin/cilium-mount;
nsenter --cgroup=/hostproc/1/ns/cgroup --mount=/hostproc/1/ns/mnt "${BIN_PATH}/cilium-mount" $CGROUP_ROOT;
rm /hostbin/cilium-mount
which results in
mount-cgroup nsenter: failed to execute /opt/cni/bin/cilium-mount: Permission denied
This is despite the file permissions looking to be correct:
$ talosctl list opt/cni/bin/ -l
NODE MODE UID GID SIZE(B) LASTMOD NAME
master1.lan drwxr-xr-x 0 0 26 Jul 20 13:38:19 .
master1.lan -rwxr-xr-x 0 0 3424256 Jul 20 14:33:04 cilium-mount
So it seems like something else (namespaced mounts?) is blocking this. Deploying Cilium did work with Talos v1.0, but I haven’t yet found the commit that broke the support. Let me know how I can debug this further or what other logs I can look at.
Update: Likely upstream issue due to insufficient privileges for running mount
, can be worked around by passing --set securityContext.privileged=true
to Helm (which restores the pre v1.12
behavior).
Environment
- Talos version:
Client:
Tag: v1.1.1
SHA: 40a050c6
Built:
Go version: go1.18.4
OS/Arch: linux/amd64
Server:
NODE: master1.lan
Tag: v1.2.0-alpha.0-43-g56a757cc8
SHA: 56a757cc
Built:
Go version: go1.18.4
OS/Arch: linux/amd64
Enabled: RBAC
- Kubernetes version:
Client Version: v1.24.0
Kustomize Version: v4.5.4
Server Version: v1.24.2
- Platform: Proxmox (nocloud)
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Reactions: 1
- Comments: 17 (13 by maintainers)
Commits related to this issue
- fix: folder permissions of overlay mounted folders Set the correct permissions for the overlay mounted folders. This issue was identified from #5948 Signed-off-by: Noel Georgi <git@frezbo.dev> — committed to frezbo/talos by frezbo 2 years ago
- fix: folder permissions of overlay mounted folders Set the correct permissions for the overlay mounted folders. This issue was identified from #5948 Signed-off-by: Noel Georgi <git@frezbo.dev> (cher... — committed to smira/talos by frezbo 2 years ago
- https://github.com/siderolabs/talos/issues/5948 — committed to on2itsecurity/secure-k8s by nberlee 2 years ago
@twelho this is not a talos issue, if you look at the diff between 1.11.7 and 1.12.0 version of the cilium helm chart, they changed the default value of
securityContext.privileged
fromtrue
tofalse
, even though cilium add theSYS_ADMIN
capability, it’s not enough to domount
operations, you’d also need to setprivileged: true
for the pod securityContext. This can be fixed by adding--set securityContext.privileged=true
while doing a helm install. The talos docs for cilium should still work as it’s pinned to cilium version1.11.2
I assume it’s a cilium bug, unless I’m missing some information. I was waiting on to see if someone else also reports it just to understand if we missed something on talos side. The
v1.12.0
still doesn’t have any release notes, so that’s also another thing I’m waiting on