openyurt: [BUG] Coredns cannot resolve node hostname

What happened: I have deployed metrics-server on the cloud node. It continues to report the following error:

E1203 12:49:23.192743       1 scraper.go:139] "Failed to scrape node" err="Get \"https://node-219:10250/stats/summary?only_cpu_and_memory=true\": context deadline exceeded" node="node-219"
E1203 12:49:23.192760       1 scraper.go:139] "Failed to scrape node" err="Get \"https://node-221:10250/stats/summary?only_cpu_and_memory=true\": context deadline exceeded" node="node-221"
E1203 12:49:23.192766       1 scraper.go:139] "Failed to scrape node" err="Get \"https://node-224:10250/stats/summary?only_cpu_and_memory=true\": context deadline exceeded" node="node-224"
E1203 12:49:23.192769       1 scraper.go:139] "Failed to scrape node" err="Get \"https://center:10250/stats/summary?only_cpu_and_memory=true\": context deadline exceeded" node="center"
E1203 12:49:23.192746       1 scraper.go:139] "Failed to scrape node" err="Get \"https://node-222:10250/stats/summary?only_cpu_and_memory=true\": context deadline exceeded" node="node-222"
E1203 12:49:23.192801       1 scraper.go:139] "Failed to scrape node" err="Get \"https://node-218:10250/stats/summary?only_cpu_and_memory=true\": context deadline exceeded" node="node-218"
E1203 12:49:23.192802       1 scraper.go:139] "Failed to scrape node" err="Get \"https://node-223:10250/stats/summary?only_cpu_and_memory=true\": context deadline exceeded" node="node-223"
E1203 12:49:23.193890       1 scraper.go:139] "Failed to scrape node" err="Get \"https://node-220:10250/stats/summary?only_cpu_and_memory=true\": context deadline exceeded" node="node-220"
E1203 12:49:23.193916       1 scraper.go:139] "Failed to scrape node" err="Get \"https://dell2015:10250/stats/summary?only_cpu_and_memory=true\": context deadline exceeded" node="dell2015"
E1203 12:49:23.193923       1 scraper.go:139] "Failed to scrape node" err="Get \"https://node-225:10250/stats/summary?only_cpu_and_memory=true\": context deadline exceeded" node="node-225"
I1203 12:49:23.445026       1 server.go:188] "Failed probe" probe="metric-storage-ready" err="not metrics to serve"
I1203 12:49:33.445641       1 server.go:188] "Failed probe" probe="metric-storage-ready" err="not metrics to serve"

When I turned on the log function of coredns and checked the logs, I found that coredns could not resolve the hostname:

[INFO] 10.244.0.21:57363 - 10071 "AAAA IN node-221.kube-system.svc.cluster.local. udp 56 false 512" NXDOMAIN qr,aa,rd 149 0.000101561s
[INFO] 10.244.0.21:49140 - 15804 "A IN node-225.kube-system.svc.cluster.local. udp 56 false 512" NXDOMAIN qr,aa,rd 149 0.000197454s
[INFO] 10.244.0.21:52591 - 40725 "AAAA IN node-223.kube-system.svc.cluster.local. udp 56 false 512" NXDOMAIN qr,aa,rd 149 0.000170565s
[INFO] 10.244.0.21:46343 - 27268 "A IN node-222.svc.cluster.local. udp 44 false 512" NXDOMAIN qr,aa,rd 137 0.000144675s
[INFO] 10.244.0.21:53605 - 21188 "AAAA IN dell2015.svc.cluster.local. udp 44 false 512" NXDOMAIN qr,aa,rd 137 0.00017991s
[INFO] 10.244.0.21:56493 - 14043 "A IN node-223.kube-system.svc.cluster.local. udp 56 false 512" NXDOMAIN qr,aa,rd 149 0.000145095s
[INFO] 10.244.0.21:57767 - 20232 "A IN node-225.kube-system.svc.cluster.local. udp 56 false 512" NXDOMAIN qr,aa,rd 149 0.000104592s
[INFO] 10.244.0.21:55905 - 46769 "AAAA IN dell2015.svc.cluster.local. udp 44 false 512" NXDOMAIN qr,aa,rd 137 0.000100891s
[INFO] 10.244.0.21:38400 - 21470 "A IN node-218.svc.cluster.local. udp 44 false 512" NXDOMAIN qr,aa,rd 137 0.000154556s
[INFO] 10.244.0.21:42241 - 28115 "AAAA IN node-223.svc.cluster.local. udp 44 false 512" NXDOMAIN qr,aa,rd 137 0.00014793s
[INFO] 10.244.0.21:46009 - 15495 "AAAA IN node-225.svc.cluster.local. udp 44 false 512" NXDOMAIN qr,aa,rd 137 0.000150244s
[INFO] 10.244.0.21:43989 - 42034 "A IN node-223.kube-system.svc.cluster.local. udp 56 false 512" NXDOMAIN qr,aa,rd 149 0.000086667s
[INFO] 10.244.0.21:37473 - 36930 "AAAA IN node-218.kube-system.svc.cluster.local. udp 56 false 512" NXDOMAIN qr,aa,rd 149 0.000160677s
[INFO] 10.244.0.21:38626 - 9816 "A IN node-225.kube-system.svc.cluster.local. udp 56 false 512" NXDOMAIN qr,aa,rd 149 0.00009503s
[INFO] 10.244.0.21:57427 - 45436 "A IN dell2015.kube-system.svc.cluster.local. udp 56 false 512" NXDOMAIN qr,aa,rd 149 0.000181907s
[INFO] 10.244.0.21:42602 - 2082 "AAAA IN node-223.kube-system.svc.cluster.local. udp 56 false 512" NXDOMAIN qr,aa,rd 149 0.00021916s
[INFO] 10.244.0.21:48372 - 64152 "AAAA IN dell2015.kube-system.svc.cluster.local. udp 56 false 512" NXDOMAIN qr,aa,rd 149 0.000215355s
[INFO] 10.244.0.21:38931 - 17188 "A IN node-220.kube-system.svc.cluster.local. udp 56 false 512" NXDOMAIN qr,aa,rd 149 0.000149272s
[INFO] 10.244.0.21:47704 - 5818 "A IN node-218.svc.cluster.local. udp 44 false 512" NXDOMAIN qr,aa,rd 137 0.000100259s
[INFO] 10.244.0.21:43007 - 5861 "AAAA IN dell2015.svc.cluster.local. udp 44 false 512" NXDOMAIN qr,aa,rd 137 0.00007362s
[INFO] 10.244.0.21:56270 - 62782 "A IN dell2015.svc.cluster.local. udp 44 false 512" NXDOMAIN qr,aa,rd 137 0.000167426s

In fact, I have mounted the yurt-tunnel-nodes configmap to coredns:

---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  ...
  name: coredns
  ...
spec:
  ...
  template:
    ...
    spec:
      containers:
        - args:
            - '-conf'
            - /etc/coredns/Corefile
          image: 'registry.aliyuncs.com/google_containers/coredns:1.8.4'
          ...
          volumeMounts:
            - mountPath: /etc/edge       # here
              name: edge
              readOnly: true
            - mountPath: /etc/coredns
              name: config-volume
              readOnly: true
      ...
      volumes:
        - configMap:
            defaultMode: 420
            name: yurt-tunnel-nodes     # here
          name: edge
        - configMap:
            defaultMode: 420
            items:
              - key: Corefile
                path: Corefile
            name: coredns
          name: config-volume
  ...

And I added the hosts to the configmap of coredns:

---
apiVersion: v1
data:
  Corefile: |
    .:53 {
        errors
        health {
           lameduck 5s
        }
        log {
        }
        ready
        hosts /etc/edge/tunnel-nodes {    # here
            reload 300ms
            fallthrough
        }
        kubernetes cluster.local in-addr.arpa ip6.arpa {
           pods verified
           fallthrough in-addr.arpa ip6.arpa
           ttl 30
        }
        prometheus :9153
        forward . /etc/resolv.conf {
           max_concurrent 1000
        }
        cache 30
        loop
        reload
        loadbalance
    }
kind: ConfigMap
metadata:
  name: coredns
  namespace: kube-system
  resourceVersion: '1363115'

And the yurt-tunnel-nodes configmap is as shown below, where 10.107.2.246 is the ClusterIP of x-tunnel-server-internal-svc:

---
apiVersion: v1
data:
  tunnel-nodes: "10.107.2.246\tdell2015\n10.107.2.246\tnode-218\n10.107.2.246\tnode-219\n10.107.2.246\tnode-220\n10.107.2.246\tnode-221\n10.107.2.246\tnode-222\n10.107.2.246\tnode-223\n10.107.2.246\tnode-224\n10.107.2.246\tnode-225\n172.26.146.181\tcenter"
kind: ConfigMap
metadata:
  annotations: {}
  name: yurt-tunnel-nodes
  namespace: kube-system
  resourceVersion: '1296168'

I think all these configures is well. So why coredns returns NXDOMAIN where solving the node hostname?

What you expected to happen: Coredns can resolve the node hostname to the ClusterIP of x-tunnel-server-internal-svc.

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

  • OpenYurt version: 1.1
  • Kubernetes version (use kubectl version): 1.22.8
  • OS (e.g: cat /etc/os-release): Ubuntu 22.04.1 LTS
  • Kernel (e.g. uname -a): 5.15.0-46-generic
  • Install tools: Manually Setup
  • Others:

others

/kind bug

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 23 (14 by maintainers)

Most upvoted comments

@rambohe-ch Thanks a lot! It works. I successfully deployed yurthub on the master node and the service topology function works well. My metrics-server is also working fine. Thanks again for your answer!