talos: Cilium agents fail to start due to mount permissions with Cilium v1.12.0 (likely upstream issue)

Bug Report

Description

I created a new cluster without CNI by adding --config-patch '[{"op": "add", "path": "/cluster/proxy", "value": {"disabled": true}}, {"op":"add", "path": "/cluster/network/cni", "value": {"name": "none"}}]' to talosctl gen config.

After running talosctl bootstrap, deploying Cilium with Helm using

helm install cilium cilium/cilium --namespace kube-system --set ipam.mode=kubernetes --set kubeProxyReplacement=strict --set k8sServiceHost="master1.lan" --set k8sServicePort="6443"

results in Cilium initialization never completing. While the operators start up, all workers end up in CrashLoopBackOff trying to run the command

sh
-ec
cp /usr/bin/cilium-mount /hostbin/cilium-mount;
nsenter --cgroup=/hostproc/1/ns/cgroup --mount=/hostproc/1/ns/mnt "${BIN_PATH}/cilium-mount" $CGROUP_ROOT;
rm /hostbin/cilium-mount

which results in

mount-cgroup nsenter: failed to execute /opt/cni/bin/cilium-mount: Permission denied

This is despite the file permissions looking to be correct:

$ talosctl list opt/cni/bin/ -l
NODE                               MODE         UID   GID   SIZE(B)   LASTMOD           NAME
master1.lan   drwxr-xr-x   0     0     26        Jul 20 13:38:19   .
master1.lan   -rwxr-xr-x   0     0     3424256   Jul 20 14:33:04   cilium-mount

So it seems like something else (namespaced mounts?) is blocking this. Deploying Cilium did work with Talos v1.0, but I haven’t yet found the commit that broke the support. Let me know how I can debug this further or what other logs I can look at.

Update: Likely upstream issue due to insufficient privileges for running mount, can be worked around by passing --set securityContext.privileged=true to Helm (which restores the pre v1.12 behavior).

Environment

  • Talos version:
Client:
        Tag:         v1.1.1
        SHA:         40a050c6
        Built:       
        Go version:  go1.18.4
        OS/Arch:     linux/amd64
Server:
        NODE:        master1.lan
        Tag:         v1.2.0-alpha.0-43-g56a757cc8
        SHA:         56a757cc
        Built:       
        Go version:  go1.18.4
        OS/Arch:     linux/amd64
        Enabled:     RBAC
  • Kubernetes version:
Client Version: v1.24.0
Kustomize Version: v4.5.4
Server Version: v1.24.2
  • Platform: Proxmox (nocloud)

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 1
  • Comments: 17 (13 by maintainers)

Commits related to this issue

Most upvoted comments

@twelho this is not a talos issue, if you look at the diff between 1.11.7 and 1.12.0 version of the cilium helm chart, they changed the default value of securityContext.privileged from true to false, even though cilium add the SYS_ADMIN capability, it’s not enough to do mount operations, you’d also need to set privileged: true for the pod securityContext. This can be fixed by adding --set securityContext.privileged=true while doing a helm install. The talos docs for cilium should still work as it’s pinned to cilium version 1.11.2

Should this be considered a Cilium bug? v1.12.0 was technically released yesterday and is pointed to both by the docs and the Helm charts now, but the release is indeed missing from GitHub…

I assume it’s a cilium bug, unless I’m missing some information. I was waiting on to see if someone else also reports it just to understand if we missed something on talos side. The v1.12.0 still doesn’t have any release notes, so that’s also another thing I’m waiting on