cilium: kubeadm cluster not working against Ubuntu 22.04 and RHEL 9

Is there an existing issue for this?

  • I have searched the existing issues

What happened?

Ubuntu 20.04 works but somehow kubeadm cluster does not work on Ubuntu 22.04 and RHEL 9 using Cilium CNI while Calico was fine.

It looks like CoreDNS returns DNS lookup failing with the following log and pod cannot goes out to the external network.

root@k8s:~# kubectl get pod -A
NAMESPACE     NAME                              READY   STATUS    RESTARTS   AGE
kube-system   cilium-m5twm                      1/1     Running   0          57s
kube-system   cilium-operator-66d7f84f9-9wlq4   1/1     Running   0          57s
kube-system   cilium-operator-66d7f84f9-ng84m   0/1     Pending   0          57s
kube-system   coredns-6d4b75cb6d-8b4zv          0/1     Running   0          72s
kube-system   coredns-6d4b75cb6d-htmnm          0/1     Running   0          72s
kube-system   etcd-k8s                          1/1     Running   0          84s
kube-system   kube-apiserver-k8s                1/1     Running   0          84s
kube-system   kube-controller-manager-k8s       1/1     Running   0          84s
kube-system   kube-proxy-n5ckh                  1/1     Running   0          72s
kube-system   kube-scheduler-k8s                1/1     Running   0          84s
root@k8s:~# cilium status
    /¯¯\
 /¯¯\__/¯¯\    Cilium:         OK
 \__/¯¯\__/    Operator:       1 errors, 1 warnings
 /¯¯\__/¯¯\    Hubble:         disabled
 \__/¯¯\__/    ClusterMesh:    disabled
    \__/

Deployment        cilium-operator    Desired: 2, Ready: 1/2, Available: 1/2, Unavailable: 1/2
DaemonSet         cilium             Desired: 1, Ready: 1/1, Available: 1/1
Containers:       cilium             Running: 1
                  cilium-operator    Running: 1, Pending: 1
Cluster Pods:     2/2 managed by Cilium
Image versions    cilium             quay.io/cilium/cilium:v1.11.5@sha256:79e66c3c2677e9ecc3fd5b2ed8e4ea7e49cf99ed6ee181f2ef43400c4db5eef0: 1
                  cilium-operator    quay.io/cilium/operator-generic:v1.11.5@sha256:8ace281328b27d4216218c604d720b9a63a8aec2bd1996057c79ab0168f9d6d8: 2
Errors:           cilium-operator    cilium-operator                    1 pods of Deployment cilium-operator are not ready
Warnings:         cilium-operator    cilium-operator-66d7f84f9-ng84m    pod is pending
root@k8s:~# kubectl logs coredns-6d4b75cb6d-5lwzr -n kube-system
[WARNING] plugin/kubernetes: starting server with unsynced Kubernetes API
.:53
[INFO] plugin/reload: Running configuration MD5 = db32ca3650231d74073ff4cf814959a7
CoreDNS-1.8.6
linux/amd64, go1.17.1, 13a9191
[ERROR] plugin/errors: 2 6564561870664525310.7184499561496433766. HINFO: read udp 10.0.0.116:59569->my-dns-ip:53: i/o timeout
[ERROR] plugin/errors: 2 6564561870664525310.7184499561496433766. HINFO: read udp 10.0.0.116:59138->my-dns-ip:53: i/o timeout
root@k8s:~# kubectl logs coredns-6d4b75cb6d-5lwzr -n kube-system
[WARNING] plugin/kubernetes: starting server with unsynced Kubernetes API
.:53
[INFO] plugin/reload: Running configuration MD5 = db32ca3650231d74073ff4cf814959a7
CoreDNS-1.8.6
linux/amd64, go1.17.1, 13a9191
[ERROR] plugin/errors: 2 6564561870664525310.7184499561496433766. HINFO: read udp 10.0.0.116:59569->my-dns-ip:53: i/o timeout
[ERROR] plugin/errors: 2 6564561870664525310.7184499561496433766. HINFO: read udp 10.0.0.116:59138->my-dns-ip:53: i/o timeout
[ERROR] plugin/errors: 2 6564561870664525310.7184499561496433766. HINFO: read udp 10.0.0.116:43059->my-dns-ip:53: i/o timeout
[ERROR] plugin/errors: 2 6564561870664525310.7184499561496433766. HINFO: read udp 10.0.0.116:39485->my-dns-ip:53: i/o timeout

Could anybody give an idea what’s going on?

Cilium Version

1.11.5

Kernel Version

root@k8s:~# uname -a
Linux k8s 5.15.0-30-generic #31-Ubuntu SMP Thu May 5 10:00:34 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Kubernetes Version

1.24.1

Sysdump

cilium-sysdump-20220609-165113.zip

Relevant log output

No response

Anything else?

No response

Code of Conduct

  • I agree to follow this project’s Code of Conduct

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 24 (11 by maintainers)

Most upvoted comments

I think the proper way to override all rp_filter setting is net.ipv4.conf.*.rp_filter = 0, note the *. I just did a test and it works, also see one of my history note https://github.com/cilium/cilium/issues/18131#issuecomment-988160016, I was just bitten by it after upgrading to ubuntu 22.04

Ubuntu 22.04 decided to place these settings in two places. Not very happy about it.

# ls -ld /etc/sysctl.d/ /usr/lib/sysctl.d/
drwxr-xr-x 2 root root 4096 Jul 15 04:57 /etc/sysctl.d/
drwxr-xr-x 2 root root 4096 Jul 15 04:57 /usr/lib/sysctl.d/

Problem:

# sysctl -a | grep '\.rp_filter'
net.ipv4.conf.all.rp_filter = 2
net.ipv4.conf.default.rp_filter = 2
net.ipv4.conf.eth0.rp_filter = 2
net.ipv4.conf.lo.rp_filter = 2

Solution:

# sed -i -e '/net.ipv4.conf.*.rp_filter/d' $(grep -ril '\.rp_filter' /etc/sysctl.d/ /usr/lib/sysctl.d/)
# sysctl -a | grep '\.rp_filter' | awk '{print $1" = 0"}' > /etc/sysctl.d/1000-cilium.conf

Reload:

# sysctl --system

Profit:

# sysctl -a | grep '\.rp_filter'
net.ipv4.conf.all.rp_filter = 0
net.ipv4.conf.default.rp_filter = 0
net.ipv4.conf.eth0.rp_filter = 0
net.ipv4.conf.lo.rp_filter = 0

Scripted k3s install just works now and my pods are actually running.

root@k3s-server-7dea43cc01b7:~# kubectl get pods -A
NAMESPACE       NAME                                            READY   STATUS    RESTARTS   AGE
ingress-nginx   ingress-nginx-controller-54d587fbc6-7l8kj       1/1     Running   0          43s
ingress-nginx   ingress-nginx-controller-54d587fbc6-hpgbn       1/1     Running   0          43s
kube-system     cilium-d2jw8                                    1/1     Running   0          75s
kube-system     cilium-jmw89                                    1/1     Running   0          66s
kube-system     cilium-kctbs                                    1/1     Running   0          19s
kube-system     cilium-operator-5d67fc458d-vktn8                1/1     Running   0          88s
kube-system     cilium-qrnbw                                    1/1     Running   0          38s
kube-system     cilium-wg2dp                                    1/1     Running   0          53s
kube-system     coredns-d76bd69b-jc746                          1/1     Running   0          88s
kube-system     svclb-ingress-nginx-controller-64ce6aa3-62cph   2/2     Running   0          32s
kube-system     svclb-ingress-nginx-controller-64ce6aa3-fkztf   2/2     Running   0          43s
kube-system     svclb-ingress-nginx-controller-64ce6aa3-h8lmr   2/2     Running   0          43s
kube-system     svclb-ingress-nginx-controller-64ce6aa3-t449f   2/2     Running   0          18s

@xiaods That’s a different issue than the one mentioned by @protosam. @xiaods please open a new GitHub issue to track.