cilium: Kube-proxy replacement not working when running without privileges

Is there an existing issue for this?

  • I have searched the existing issues

What happened?

I’m running this setup:

  • Single node Kubernetes cluster based on Talos Linux (Talos Linux 1.2.3 with Kubernetes 1.25.0, installed without kube-proxy and without CNI)

Install Cilium using helm (this needs to be this complex as SYS_MODULE is not available on Talos Linux):

$ cat <<PR > postrend.sh
#!/bin/sh

set -e

$ cat <&0 > base.yaml
kubectl kustomize .
PR

$ cat <<PATCH > caps-patch.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: cilium
  namespace: kube-system
spec:
  template:
    spec:
      containers:
        - name: cilium-agent
          securityContext:
            capabilities:
              add:
                - CHOWN
                - KILL
                - NET_ADMIN
                - NET_RAW
                - IPC_LOCK
                - SYS_RESOURCE
                - PERFMON
                - BPF
                - DAC_OVERRIDE
                - FOWNER
                - SETGID
                - SETUID
                - SYS_ADMIN
      initContainers:
        - name: clean-cilium-state
          securityContext:
            capabilities:
              add:
                - NET_ADMIN
                - SYS_RESOURCE
                - PERFMON
                - BPF
PATCH

$ cat <<KUST > kustomization.yaml
resources:
- base.yaml
patchesStrategicMerge:
- caps-patch.yaml
KUST

$ cat <VALUES > values.yaml
k8sServiceHost: <your loadbalancer IP>
k8sServicePort: 6443
kubeProxyReplacement: strict
operator:
  replicas: 1
securityContext:
  extraCapabilities:
  - PERFMON
  - BPF
  privileged: false
VALUES

$ helm upgrade --install -n kube-system --version 1.12.2 -f values.yaml --post-renderer ./postrend.sh cilium cilium/cilium

In my case, coredns is reliably crashlooping or otherwise unhappy about not being able to reach the kubernetes API. This can further be debugged by launching a pod that does something like curl https://10.96.0.1:443/ (the ClusterIP Service default/kubernetes).

Funny thing is that when I change securityContext.privileged to true and upgrade the chart, the problem vanishes.

Inter-Pod communication works in both situations, even between nodes, as long as you connect to Pod IP and not through a service.

Cilium Version

v1.12.2

Kernel Version

5.15.68-talos

Kubernetes Version

v1.25.0

Sysdump

cilium-sysdump-20221006-140816.zip

Relevant log output

No response

Anything else?

I’ve upgraded to 1.13.0-rc1 without effect.

Code of Conduct

  • I agree to follow this project’s Code of Conduct

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 2
  • Comments: 38 (34 by maintainers)

Most upvoted comments

Strict mode fails for me on that setup (rpi4, talos 1.3.2) but with cilium 1.13.0-rc4 and ipv6 enabled. The agent fails with


could not load module ip_tables: exit status 1

Disabling ipv6 makes it work again.

Edit: agent is running in privileged mode.

@rio This will be fixed by https://github.com/cilium/cilium/pull/23953

@aanm does the logs provided above provide any hint? Is there any other information needed?