openyurt: [BUG] Coredns cannot resolve node hostname
What happened: I have deployed metrics-server on the cloud node. It continues to report the following error:
E1203 12:49:23.192743 1 scraper.go:139] "Failed to scrape node" err="Get \"https://node-219:10250/stats/summary?only_cpu_and_memory=true\": context deadline exceeded" node="node-219"
E1203 12:49:23.192760 1 scraper.go:139] "Failed to scrape node" err="Get \"https://node-221:10250/stats/summary?only_cpu_and_memory=true\": context deadline exceeded" node="node-221"
E1203 12:49:23.192766 1 scraper.go:139] "Failed to scrape node" err="Get \"https://node-224:10250/stats/summary?only_cpu_and_memory=true\": context deadline exceeded" node="node-224"
E1203 12:49:23.192769 1 scraper.go:139] "Failed to scrape node" err="Get \"https://center:10250/stats/summary?only_cpu_and_memory=true\": context deadline exceeded" node="center"
E1203 12:49:23.192746 1 scraper.go:139] "Failed to scrape node" err="Get \"https://node-222:10250/stats/summary?only_cpu_and_memory=true\": context deadline exceeded" node="node-222"
E1203 12:49:23.192801 1 scraper.go:139] "Failed to scrape node" err="Get \"https://node-218:10250/stats/summary?only_cpu_and_memory=true\": context deadline exceeded" node="node-218"
E1203 12:49:23.192802 1 scraper.go:139] "Failed to scrape node" err="Get \"https://node-223:10250/stats/summary?only_cpu_and_memory=true\": context deadline exceeded" node="node-223"
E1203 12:49:23.193890 1 scraper.go:139] "Failed to scrape node" err="Get \"https://node-220:10250/stats/summary?only_cpu_and_memory=true\": context deadline exceeded" node="node-220"
E1203 12:49:23.193916 1 scraper.go:139] "Failed to scrape node" err="Get \"https://dell2015:10250/stats/summary?only_cpu_and_memory=true\": context deadline exceeded" node="dell2015"
E1203 12:49:23.193923 1 scraper.go:139] "Failed to scrape node" err="Get \"https://node-225:10250/stats/summary?only_cpu_and_memory=true\": context deadline exceeded" node="node-225"
I1203 12:49:23.445026 1 server.go:188] "Failed probe" probe="metric-storage-ready" err="not metrics to serve"
I1203 12:49:33.445641 1 server.go:188] "Failed probe" probe="metric-storage-ready" err="not metrics to serve"
When I turned on the log function of coredns and checked the logs, I found that coredns could not resolve the hostname:
[INFO] 10.244.0.21:57363 - 10071 "AAAA IN node-221.kube-system.svc.cluster.local. udp 56 false 512" NXDOMAIN qr,aa,rd 149 0.000101561s
[INFO] 10.244.0.21:49140 - 15804 "A IN node-225.kube-system.svc.cluster.local. udp 56 false 512" NXDOMAIN qr,aa,rd 149 0.000197454s
[INFO] 10.244.0.21:52591 - 40725 "AAAA IN node-223.kube-system.svc.cluster.local. udp 56 false 512" NXDOMAIN qr,aa,rd 149 0.000170565s
[INFO] 10.244.0.21:46343 - 27268 "A IN node-222.svc.cluster.local. udp 44 false 512" NXDOMAIN qr,aa,rd 137 0.000144675s
[INFO] 10.244.0.21:53605 - 21188 "AAAA IN dell2015.svc.cluster.local. udp 44 false 512" NXDOMAIN qr,aa,rd 137 0.00017991s
[INFO] 10.244.0.21:56493 - 14043 "A IN node-223.kube-system.svc.cluster.local. udp 56 false 512" NXDOMAIN qr,aa,rd 149 0.000145095s
[INFO] 10.244.0.21:57767 - 20232 "A IN node-225.kube-system.svc.cluster.local. udp 56 false 512" NXDOMAIN qr,aa,rd 149 0.000104592s
[INFO] 10.244.0.21:55905 - 46769 "AAAA IN dell2015.svc.cluster.local. udp 44 false 512" NXDOMAIN qr,aa,rd 137 0.000100891s
[INFO] 10.244.0.21:38400 - 21470 "A IN node-218.svc.cluster.local. udp 44 false 512" NXDOMAIN qr,aa,rd 137 0.000154556s
[INFO] 10.244.0.21:42241 - 28115 "AAAA IN node-223.svc.cluster.local. udp 44 false 512" NXDOMAIN qr,aa,rd 137 0.00014793s
[INFO] 10.244.0.21:46009 - 15495 "AAAA IN node-225.svc.cluster.local. udp 44 false 512" NXDOMAIN qr,aa,rd 137 0.000150244s
[INFO] 10.244.0.21:43989 - 42034 "A IN node-223.kube-system.svc.cluster.local. udp 56 false 512" NXDOMAIN qr,aa,rd 149 0.000086667s
[INFO] 10.244.0.21:37473 - 36930 "AAAA IN node-218.kube-system.svc.cluster.local. udp 56 false 512" NXDOMAIN qr,aa,rd 149 0.000160677s
[INFO] 10.244.0.21:38626 - 9816 "A IN node-225.kube-system.svc.cluster.local. udp 56 false 512" NXDOMAIN qr,aa,rd 149 0.00009503s
[INFO] 10.244.0.21:57427 - 45436 "A IN dell2015.kube-system.svc.cluster.local. udp 56 false 512" NXDOMAIN qr,aa,rd 149 0.000181907s
[INFO] 10.244.0.21:42602 - 2082 "AAAA IN node-223.kube-system.svc.cluster.local. udp 56 false 512" NXDOMAIN qr,aa,rd 149 0.00021916s
[INFO] 10.244.0.21:48372 - 64152 "AAAA IN dell2015.kube-system.svc.cluster.local. udp 56 false 512" NXDOMAIN qr,aa,rd 149 0.000215355s
[INFO] 10.244.0.21:38931 - 17188 "A IN node-220.kube-system.svc.cluster.local. udp 56 false 512" NXDOMAIN qr,aa,rd 149 0.000149272s
[INFO] 10.244.0.21:47704 - 5818 "A IN node-218.svc.cluster.local. udp 44 false 512" NXDOMAIN qr,aa,rd 137 0.000100259s
[INFO] 10.244.0.21:43007 - 5861 "AAAA IN dell2015.svc.cluster.local. udp 44 false 512" NXDOMAIN qr,aa,rd 137 0.00007362s
[INFO] 10.244.0.21:56270 - 62782 "A IN dell2015.svc.cluster.local. udp 44 false 512" NXDOMAIN qr,aa,rd 137 0.000167426s
In fact, I have mounted the yurt-tunnel-nodes configmap to coredns:
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
...
name: coredns
...
spec:
...
template:
...
spec:
containers:
- args:
- '-conf'
- /etc/coredns/Corefile
image: 'registry.aliyuncs.com/google_containers/coredns:1.8.4'
...
volumeMounts:
- mountPath: /etc/edge # here
name: edge
readOnly: true
- mountPath: /etc/coredns
name: config-volume
readOnly: true
...
volumes:
- configMap:
defaultMode: 420
name: yurt-tunnel-nodes # here
name: edge
- configMap:
defaultMode: 420
items:
- key: Corefile
path: Corefile
name: coredns
name: config-volume
...
And I added the hosts to the configmap of coredns:
---
apiVersion: v1
data:
Corefile: |
.:53 {
errors
health {
lameduck 5s
}
log {
}
ready
hosts /etc/edge/tunnel-nodes { # here
reload 300ms
fallthrough
}
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods verified
fallthrough in-addr.arpa ip6.arpa
ttl 30
}
prometheus :9153
forward . /etc/resolv.conf {
max_concurrent 1000
}
cache 30
loop
reload
loadbalance
}
kind: ConfigMap
metadata:
name: coredns
namespace: kube-system
resourceVersion: '1363115'
And the yurt-tunnel-nodes configmap is as shown below, where 10.107.2.246 is the ClusterIP of x-tunnel-server-internal-svc:
---
apiVersion: v1
data:
tunnel-nodes: "10.107.2.246\tdell2015\n10.107.2.246\tnode-218\n10.107.2.246\tnode-219\n10.107.2.246\tnode-220\n10.107.2.246\tnode-221\n10.107.2.246\tnode-222\n10.107.2.246\tnode-223\n10.107.2.246\tnode-224\n10.107.2.246\tnode-225\n172.26.146.181\tcenter"
kind: ConfigMap
metadata:
annotations: {}
name: yurt-tunnel-nodes
namespace: kube-system
resourceVersion: '1296168'
I think all these configures is well. So why coredns returns NXDOMAIN where solving the node hostname?
What you expected to happen: Coredns can resolve the node hostname to the ClusterIP of x-tunnel-server-internal-svc.
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
Environment:
- OpenYurt version: 1.1
- Kubernetes version (use
kubectl version): 1.22.8 - OS (e.g:
cat /etc/os-release): Ubuntu 22.04.1 LTS - Kernel (e.g.
uname -a): 5.15.0-46-generic - Install tools: Manually Setup
- Others:
others
/kind bug
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 23 (14 by maintainers)
@rambohe-ch Thanks a lot! It works. I successfully deployed yurthub on the master node and the service topology function works well. My metrics-server is also working fine. Thanks again for your answer!