cilium: Coredns fails connecting to kube-api via kubernetes service

Is there an existing issue for this?

  • I have searched the existing issues

What happened?

As initially reported here I’m unable to get dns working due to coredns failing to connect to kubernetes api - I think it might be a regression since upgrading to $latest.

Whole post follows as copy-paste:

Cluster information:

Kubernetes version: v1.28.1 Cloud being used: bare-metal Installation method: kubeadm Host OS: Ubuntu 22.04.3 LTS CNI and version: cilium 1.14.1 CRI and version: containerd 1.6.22

Today, after upgrading to 1.28.1 I realized that my test cluster is unable to get coredns ready:

$ k get po -A | grep core
kube-system   coredns-5dd5756b68-hchqq            0/1     Running   0             57m
kube-system   coredns-5dd5756b68-r768b            0/1     Running   0             57m

Upon inspecting the logs there seem to be some connectivity issue between coredns and kube-api:

$ k -n kube-system logs coredns-5dd5756b68-hchqq | tail -5 | tail -2
[WARNING] plugin/kubernetes: Kubernetes API connection failure: Get "https://10.96.0.1:443/version": dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/ready: Still waiting on: "kubernetes"

cilium connectivity test seem to run into the same issue:

$ cilium connectivity test
ℹ️  Monitor aggregation detected, will skip some flow validation steps
⌛ [kubernetes] Waiting for deployment cilium-test/client to become ready...
⌛ [kubernetes] Waiting for deployment cilium-test/client2 to become ready...
⌛ [kubernetes] Waiting for deployment cilium-test/echo-same-node to become ready...
⌛ [kubernetes] Waiting for deployment cilium-test/echo-other-node to become ready...
⌛ [kubernetes] Waiting for CiliumEndpoint for pod cilium-test/client-78f9dffc84-g5z5l to appear...
⌛ [kubernetes] Waiting for CiliumEndpoint for pod cilium-test/client2-59b578d4bb-jttvw to appear...
⌛ [kubernetes] Waiting for pod cilium-test/client-78f9dffc84-g5z5l to reach DNS server on cilium-test/echo-same-node-54cc4f75b8-xt4cf pod...
⌛ [kubernetes] Waiting for pod cilium-test/client2-59b578d4bb-jttvw to reach DNS server on cilium-test/echo-same-node-54cc4f75b8-xt4cf pod...
⌛ [kubernetes] Waiting for pod cilium-test/client-78f9dffc84-g5z5l to reach DNS server on cilium-test/echo-other-node-5b87f6f4f4-cdmtl pod...
⌛ [kubernetes] Waiting for pod cilium-test/client2-59b578d4bb-jttvw to reach DNS server on cilium-test/echo-other-node-5b87f6f4f4-cdmtl pod...
⌛ [kubernetes] Waiting for pod cilium-test/client-78f9dffc84-g5z5l to reach default/kubernetes service...
connectivity test failed: timeout reached waiting for lookup for kubernetes.default from pod cilium-test/client-78f9dffc84-g5z5l to succeed (last error: context deadline exceeded)

Accessing the kube-api from outside the cluster works fine - as is demonstrated by kubectl working. 😉

Cilium status seem ok.

$ cilium status
    /¯¯\
 /¯¯\__/¯¯\    Cilium:             OK
 \__/¯¯\__/    Operator:           OK
 /¯¯\__/¯¯\    Envoy DaemonSet:    disabled (using embedded mode)
 \__/¯¯\__/    Hubble Relay:       OK
    \__/       ClusterMesh:        disabled

Deployment             hubble-relay       Desired: 1, Ready: 1/1, Available: 1/1
Deployment             hubble-ui          Desired: 1, Ready: 1/1, Available: 1/1
Deployment             cilium-operator    Desired: 2, Ready: 2/2, Available: 2/2
DaemonSet              cilium             Desired: 4, Ready: 4/4, Available: 4/4
Containers:            cilium             Running: 4
                       hubble-relay       Running: 1
                       hubble-ui          Running: 1
                       cilium-operator    Running: 2
Cluster Pods:          8/8 managed by Cilium
Helm chart version:    1.14.1
Image versions         cilium             quay.io/cilium/cilium:v1.14.1@sha256:edc1d05ea1365c4a8f6ac6982247d5c145181704894bb698619c3827b6963a72: 4
                       hubble-relay       quay.io/cilium/hubble-relay:v1.13.2: 1
                       hubble-ui          quay.io/cilium/hubble-ui:v0.11.0@sha256:bcb369c47cada2d4257d63d3749f7f87c91dde32e010b223597306de95d1ecc8: 1
                       hubble-ui          quay.io/cilium/hubble-ui-backend:v0.11.0@sha256:14c04d11f78da5c363f88592abae8d2ecee3cbe009f443ef11df6ac5f692d839: 1
                       cilium-operator    quay.io/cilium/operator-generic:v1.14.1@sha256:e061de0a930534c7e3f8feda8330976367971238ccafff42659f104effd4b5f7: 2

There are no network policies I can find to blame.

$ k get ciliumnetworkpolicies.cilium.io -A
No resources found
$ k get networkpolicies.networking.k8s.io -A
No resources found

There are endpoints which I believe should be implicitly targeted by the service:

$ k get endpointslices.discovery.k8s.io 
NAME         ADDRESSTYPE   PORTS   ENDPOINTS                                      AGE
kubernetes   IPv4          6443    192.168.100.10,192.168.100.11,192.168.100.12   140d
$ k get svc -o yaml
apiVersion: v1
items:
- apiVersion: v1
  kind: Service
  metadata:
    creationTimestamp: "2023-09-01T08:00:11Z"
    labels:
      component: apiserver
      provider: kubernetes
    name: kubernetes
    namespace: default
    resourceVersion: "2726902"
    uid: 5e7c32c9-ab89-47e4-8940-db010c2ffc4d
  spec:
    clusterIP: 10.96.0.1
    clusterIPs:
    - 10.96.0.1
    internalTrafficPolicy: Cluster
    ipFamilies:
    - IPv4
    ipFamilyPolicy: SingleStack
    ports:
    - name: https
      port: 443
      protocol: TCP
      targetPort: 6443
    sessionAffinity: None
    type: ClusterIP
  status:
    loadBalancer: {}
kind: List
metadata:
  resourceVersion: ""

I don’t believe I have any funny business in the coredns config:

$ k get all -A -l k8s-app=kube-dns
NAMESPACE     NAME                           READY   STATUS    RESTARTS        AGE
kube-system   pod/coredns-5dd5756b68-hchqq   0/1     Running   1 (6m39s ago)   4h46m
kube-system   pod/coredns-5dd5756b68-r768b   0/1     Running   1 (6m38s ago)   4h46m

NAMESPACE     NAME               TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)                  AGE
kube-system   service/kube-dns   ClusterIP   10.96.0.10   <none>        53/UDP,53/TCP,9153/TCP   4h46m

NAMESPACE     NAME                      READY   UP-TO-DATE   AVAILABLE   AGE
kube-system   deployment.apps/coredns   0/2     2            0           4h46m

NAMESPACE     NAME                                 DESIRED   CURRENT   READY   AGE
kube-system   replicaset.apps/coredns-5dd5756b68   2         2         0       4h46m

$ k -n kube-system describe cm coredns 
Name:         coredns
Namespace:    kube-system
Labels:       <none>
Annotations:  <none>

Data
====
Corefile:
----
.:53 {
    errors
    health {
       lameduck 5s
    }
    ready
    kubernetes cluster.local in-addr.arpa ip6.arpa {
       pods insecure
       fallthrough in-addr.arpa ip6.arpa
       ttl 30
    }
    prometheus :9153
    forward . /etc/resolv.conf {
       max_concurrent 1000
    }
    cache 30
    loop
    reload
    loadbalance
}


BinaryData
====

Events:  <none>

There is a service running in the container - but it does not seem to hold any data, probably due to not being able to connect to the api:

$ kubectl -n kube-system debug -it pod/coredns-5dd5756b68-hchqq --image=nicolaka/netshoot --target=coredns
coredns-5dd5756b68-hchqq  ~  ss -lnp | grep :53
udp   UNCONN 0      0                  *:53               *:*    users:(("coredns",pid=1,fd=12))
tcp   LISTEN 0      4096               *:53               *:*    users:(("coredns",pid=1,fd=11))
coredns-5dd5756b68-hchqq  ~  dig @localhost kubernetes.default

; <<>> DiG 9.18.13 <<>> @localhost kubernetes.default
; (2 servers found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 29162
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
; COOKIE: 7fdf8d625c0b48eb (echoed)
;; QUESTION SECTION:
;kubernetes.default.            IN      A

;; Query time: 0 msec
;; SERVER: ::1#53(localhost) (UDP)
;; WHEN: Fri Sep 01 13:23:08 UTC 2023
;; MSG SIZE  rcvd: 59

I can access the api from the pod on the external ip, but not the service ip:

 coredns-5dd5756b68-hchqq  ~  ping kubernetes              
ping: kubernetes: Try again

 coredns-5dd5756b68-hchqq  ~  ping k8s       
PING k8s.kubenet (192.168.100.5) 56(84) bytes of data.
64 bytes from k8s.kubenet (192.168.100.5): icmp_seq=1 ttl=62 time=0.139 ms
64 bytes from k8s.kubenet (192.168.100.5): icmp_seq=2 ttl=62 time=0.147 ms
^C
--- k8s.kubenet ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1023ms
rtt min/avg/max/mdev = 0.139/0.143/0.147/0.004 ms

 coredns-5dd5756b68-hchqq  ~  curl -k https://k8s:6443
{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {},
  "status": "Failure",
  "message": "forbidden: User \"system:anonymous\" cannot get path \"/\"",
  "reason": "Forbidden",
  "details": {},
  "code": 403
}#                                                                                                                                                                                             

 coredns-5dd5756b68-hchqq  ~  curl -k https://10.96.0.1:443 
curl: (28) Failed to connect to 10.96.0.1 port 443 after 130812 ms: Couldn't connect to server

What am I missing?

Cilium Version

cilium-cli: v0.15.7 compiled with go1.21.0 on linux/amd64 cilium image (default): v1.14.1 cilium image (stable): v1.14.1 cilium image (running): 1.14.1

Kernel Version

6.4.11-200.fc38.x86_64

Kubernetes Version

Client Version: v1.28.1 Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3 Server Version: v1.28.1

Sysdump

cilium-sysdump-20230903-172137.zip

Relevant log output

$ k get po -A | grep core
kube-system   coredns-5dd5756b68-hchqq            0/1     Running   0             57m
kube-system   coredns-5dd5756b68-r768b            0/1     Running   0             57m

$ k -n kube-system logs coredns-5dd5756b68-hchqq | tail -5 | tail -2
[WARNING] plugin/kubernetes: Kubernetes API connection failure: Get "https://10.96.0.1:443/version": dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/ready: Still waiting on: "kubernetes"

Anything else?

https://discuss.kubernetes.io/t/coredns-fails-connecting-to-kube-api-via-kubernetes-service/

Code of Conduct

  • I agree to follow this project’s Code of Conduct

About this issue

  • Original URL
  • State: closed
  • Created 10 months ago
  • Comments: 32 (19 by maintainers)

Most upvoted comments

/cc @aojea Which k8s versions will the regression fix be available in?

1.28.2 that is supposed to be released next week

Tried upgrading to 1.28.2 today:

$ k get no
NAME        STATUS   ROLES           AGE    VERSION
k8scp       Ready    control-plane   153d   v1.28.2
k8scp2      Ready    control-plane   153d   v1.28.2
k8scp3      Ready    control-plane   153d   v1.28.2
k8sworker   Ready    <none>          153d   v1.28.2

Upgrading alone does not seem to fix the dns:

$ k get po -A -l k8s-app=kube-dns -o wide
NAMESPACE     NAME                       READY   STATUS    RESTARTS   AGE     IP           NODE        NOMINATED NODE   READINESS GATES
kube-system   coredns-5dd5756b68-wxfwm   0/1     Running   0          5m45s   10.0.1.124   k8scp2      <none>           <none>
kube-system   coredns-5dd5756b68-zbt2c   0/1     Running   0          5m45s   10.0.0.39    k8sworker   <none>           <none>

Neither does deleting the old pods:

$ k delete po -A -l k8s-app=kube-dns
pod "coredns-5dd5756b68-wxfwm" deleted
pod "coredns-5dd5756b68-zbt2c" deleted
$ k get po -A -l k8s-app=kube-dns -o wide
NAMESPACE     NAME                       READY   STATUS    RESTARTS   AGE   IP           NODE     NOMINATED NODE   READINESS GATES
kube-system   coredns-5dd5756b68-fkcv5   0/1     Running   0          66s   10.0.3.145   k8scp3   <none>           <none>
kube-system   coredns-5dd5756b68-z5skz   0/1     Running   0          66s   10.0.2.127   k8scp    <none>           <none>

But a cluster reboot after the upgrade seem to get everything in shape:

$ clush -w @k8s,lb reboot
$ sleep 120
$ k get po -A -l k8s-app=kube-dns -o wide
NAMESPACE     NAME                       READY   STATUS    RESTARTS       AGE     IP           NODE        NOMINATED NODE   READINESS GATES
kube-system   coredns-5dd5756b68-fkcv5   1/1     Running   1 (104s ago)   5m28s   10.0.3.170   k8scp3      <none>           <none>
kube-system   coredns-5dd5756b68-hjws8   1/1     Running   0              24s     10.0.0.68    k8sworker   <none>           <none>
$ cilium connectivity test
(...)
✅ All 42 tests (306 actions) successful, 13 tests skipped, 0 scenarios skipped.

For completion, the bpf-stuff post upgrade:

root@k8scp3:~# cat /proc/$(ps aux | grep coredns | head -n1 | awk '{print $2}')/cgroup
0::/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod0fbef6a2_439a_4e3d_b861_c07dc2f2e567.slice/cri-containerd-8c0b79f8288484ac45ea2c35fe5c66eee446fe6bd2ef24086c90ab632994f9c1.scope

root@k8scp3:~# bpftool cgroup tree | grep -A1 8c0b79f8288484ac45ea2c35fe5c66eee446fe6bd2ef240
/sys/fs/cgroup/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod0fbef6a2_439a_4e3d_b861_c07dc2f2e567.slice/cri-containerd-8c0b79f8288484ac45ea2c35fe5c66eee446fe6bd2ef24086c90ab632994f9c1.scope
    466      device          multi

Thanks. Managed to reproduce it locally with @jspaleta 's way:

./contrib/scripts/kind.sh "" 2 "" "kindest/node:v1.28.0" "none" "ipv4"                                                                                                             cilium install
docker restart kind-control-plane kind-worker kind-worker2

Important to note that Cilium’s kube-proxy replacement is installed.

A relevant bit from the pwru output (pwru --output-tuple 'host $CORE_DNS_POD_IP'):

0xffff918e963e0ee8     19        [<empty>]             ip_local_out 10.244.1.143:58040->10.96.0.1:443(tcp)
0xffff918e963e0ee8     19        [<empty>]           __ip_local_out 10.244.1.143:58040->10.96.0.1:443(tcp)
0xffff918e963e0ee8     19        [<empty>]                ip_output 10.244.1.143:58040->10.96.0.1:443(tcp)

It suggests that the socket LB xlation didn’t happen. The BPF LB map is properly populated on the core DNS pod’s node:

root@kind-worker:/home/cilium# cilium bpf lb list
SERVICE ADDRESS     BACKEND ADDRESS (REVNAT_ID) (SLOT)
10.96.0.1:443       172.18.0.4:6443 (3) (1)
                    0.0.0.0:0 (3) (0) [ClusterIP, non-routable]
...

Next suspect is the cgroup BPF progs (used by the socket LB) attached to the wrong cgroup root. The core DNS is in the following cgroup:

> sudo cat /proc/$(ps aux | grep coredns | head -n1 | awk '{print $2}')/cgroup
0::/system.slice/docker-51314633cf3e7e7c1d6265978d19f0c0d456b02c4d74dcde9a4ff2e7107e1fc0.scope/kubelet.slice/kubelet-kubepods.slice/kubelet-kubepods-burstable.slice/kubelet-kubepods-burstable-pod1e780c35_26eb_4b9a_ada8_ea32bc0f680b.slice/cri-containerd-41e621d81a680dd398e029ddb737c294be4505d22e5af6891d03a50911cf01a3.scope

Meanwhile bpftool cgroup tree gives:

/sys/fs/cgroup/system.slice/docker-51314633cf3e7e7c1d6265978d19f0c0d456b02c4d74dcde9a4ff2e7107e1fc0.scope/kubelet.slice/kubelet-kubepods.slice/kubelet-kubepods-burstable.slice/kubelet-kubepods-burstable-pod2472f208_e443_4e02_8300_ecd005f91ff2.slice/cri-containerd-ec66d8070ee0b7ee00f512316d7811220de2f996d29c70557fcfe8e46576ad6e.scope
    12117    cgroup_device   multi
    12777    cgroup_inet4_connect multi           cil_sock4_connect
    12778    cgroup_inet6_connect multi           cil_sock6_connect
    12783    cgroup_inet4_post_bind multi           cil_sock4_post_bind
    12776    cgroup_inet6_post_bind multi           cil_sock6_post_bind
    12782    cgroup_udp4_sendmsg multi           cil_sock4_sendmsg
    12785    cgroup_udp6_sendmsg multi           cil_sock6_sendmsg
    12779    cgroup_udp4_recvmsg multi           cil_sock4_recvmsg
    12784    cgroup_udp6_recvmsg multi           cil_sock6_recvmsg
    12780    cgroup_inet4_getpeername multi           cil_sock4_getpeername
    12781    cgroup_inet6_getpeername multi           cil_sock6_getpeername

Indeed, the socket LB programs attached to the wrong cgroup common root, so the service xlation is not taking place = no connectivity to the kube-apiserver.

So what is the solution here? I’m receiving the same issue on 1.27.6 and my podcidr is 192.168.0.0/16. coredns gives me the same output as well. Is the solution to just update kubeadm?

Tried upgrading to 1.28.2 today:

$ k get no
NAME        STATUS   ROLES           AGE    VERSION
k8scp       Ready    control-plane   153d   v1.28.2
k8scp2      Ready    control-plane   153d   v1.28.2
k8scp3      Ready    control-plane   153d   v1.28.2
k8sworker   Ready    <none>          153d   v1.28.2

Upgrading alone does not seem to fix the dns:

$ k get po -A -l k8s-app=kube-dns -o wide
NAMESPACE     NAME                       READY   STATUS    RESTARTS   AGE     IP           NODE        NOMINATED NODE   READINESS GATES
kube-system   coredns-5dd5756b68-wxfwm   0/1     Running   0          5m45s   10.0.1.124   k8scp2      <none>           <none>
kube-system   coredns-5dd5756b68-zbt2c   0/1     Running   0          5m45s   10.0.0.39    k8sworker   <none>           <none>

Neither does deleting the old pods:

$ k delete po -A -l k8s-app=kube-dns
pod "coredns-5dd5756b68-wxfwm" deleted
pod "coredns-5dd5756b68-zbt2c" deleted
$ k get po -A -l k8s-app=kube-dns -o wide
NAMESPACE     NAME                       READY   STATUS    RESTARTS   AGE   IP           NODE     NOMINATED NODE   READINESS GATES
kube-system   coredns-5dd5756b68-fkcv5   0/1     Running   0          66s   10.0.3.145   k8scp3   <none>           <none>
kube-system   coredns-5dd5756b68-z5skz   0/1     Running   0          66s   10.0.2.127   k8scp    <none>           <none>

But a cluster reboot after the upgrade seem to get everything in shape:

$ clush -w @k8s,lb reboot
$ sleep 120
$ k get po -A -l k8s-app=kube-dns -o wide
NAMESPACE     NAME                       READY   STATUS    RESTARTS       AGE     IP           NODE        NOMINATED NODE   READINESS GATES
kube-system   coredns-5dd5756b68-fkcv5   1/1     Running   1 (104s ago)   5m28s   10.0.3.170   k8scp3      <none>           <none>
kube-system   coredns-5dd5756b68-hjws8   1/1     Running   0              24s     10.0.0.68    k8sworker   <none>           <none>
$ cilium connectivity test
(...)
✅ All 42 tests (306 actions) successful, 13 tests skipped, 0 scenarios skipped.

For completion, the bpf-stuff post upgrade:

root@k8scp3:~# cat /proc/$(ps aux | grep coredns | head -n1 | awk '{print $2}')/cgroup
0::/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod0fbef6a2_439a_4e3d_b861_c07dc2f2e567.slice/cri-containerd-8c0b79f8288484ac45ea2c35fe5c66eee446fe6bd2ef24086c90ab632994f9c1.scope

root@k8scp3:~# bpftool cgroup tree | grep -A1 8c0b79f8288484ac45ea2c35fe5c66eee446fe6bd2ef240
/sys/fs/cgroup/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod0fbef6a2_439a_4e3d_b861_c07dc2f2e567.slice/cri-containerd-8c0b79f8288484ac45ea2c35fe5c66eee446fe6bd2ef24086c90ab632994f9c1.scope
    466      device          multi

Thanks @azzid that you tested it. Had the same problem being discussed here on a bare metal cluster of two nodes and also after rebooting the control plane everything stopped. I updated Kubernetes to 1.28.2 then rebooted and all back to normal.

@brb, will re-run experiment in approx 4 hours! Thanks!

Okay, so it’s probably something about coredns and Cilium on 1.14 and 1.28, thanks!

Thanks for this detailed issue @azzid.

Unfortunately Cilium 1.14 only supports Kubernetes 1.27 - it looks like there might be something in the 1.28 upgrade that’s not working properly.

Could you try either doing Cilium 1.14 on Kubernetes 1.27, or Cilium main (you could also use the 1.15.0-pre.0 tag) and Kubernetes 1.28, and seeing if you get the same issue?