kubernetes: dnsPolicy in hostNetwork not working as expected
What happened:
In kubernetes 1.17, pods running with hostNetwork: true
are not able to get dns answers from the coredns-service - especially if using the strongly recommended clusterPolicy: ClusterFirstWithHostNet
Also, I noticed that the coredns Service seems to be not always reachable from the host itself.
What you expected to happen:
The coredns Service is reachable from within the pod in the hostNetwork, especially when using clusterPolicy: ClusterFirstWithHostNet
. Also, the coredns Service is reachable from the host, like this is in kubernetes 1.15
How to reproduce it (as minimally and precisely as possible):
# kubectl -n kube-system get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 97d
# dig @10.96.0.10 kubernetes.io
; <<>> DiG 9.10.3-P4-Ubuntu <<>> @10.96.0.10 kubernetes.io
; (1 server found)
;; global options: +cmd
;; connection timed out; no servers could be reached
# cat dns-pods-in-host-network.yaml
# kubectl apply -f dns-pods-in-host-network.yaml
---
apiVersion: v1
kind: Pod
metadata:
name: cluster-first
namespace: default
spec:
containers:
- name: dnsutils
image: gcr.io/kubernetes-e2e-test-images/dnsutils:1.3
command:
- sleep
- "3600"
imagePullPolicy: IfNotPresent
restartPolicy: Always
hostNetwork: true
dnsPolicy: ClusterFirst
---
apiVersion: v1
kind: Pod
metadata:
name: cluster-first-with-hostnet
namespace: default
spec:
containers:
- name: dnsutils
image: gcr.io/kubernetes-e2e-test-images/dnsutils:1.3
command:
- sleep
- "3600"
imagePullPolicy: IfNotPresent
restartPolicy: Always
dnsPolicy: ClusterFirstWithHostNet
hostNetwork: true
root@master:/tmp# kubectl exec -ti cluster-first -- nslookup kubernetes.io
Server: 1.1.1.1
Address: 1.1.1.1#53
Non-authoritative answer:
Name: kubernetes.io
Address: 147.75.40.148
root@master:/tmp# kubectl exec -ti cluster-first-with-hostnet -- nslookup kubernetes.io
;; connection timed out; no servers could be reached
command terminated with exit code 1
Anything else we need to know?: I noticed this on three small clusters with kubernetes 1.17, each running with 1 master and 2 or 3 nodes. Most of them were upgraded from lower kubernetes-versions (e.g. starting from 1.13 -> 1.14 -> 1.15 -> 1.16 -> 1.17)
Environment:
- Kubernetes version (use
kubectl version
): 1.17 - Cloud provider or hardware configuration: BareMetal, mostly running on VMware
- OS (e.g:
cat /etc/os-release
): - Kernel (e.g.
uname -a
): Linux eins 4.15.0-74-generic #83~16.04.1-Ubuntu SMP Wed Dec 18 04:56:23 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux - Install tools: kubeadm
- Network plugin and version (if this is a network-related bug): flannel
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Reactions: 1
- Comments: 22 (11 by maintainers)
Did find a workaround: switching flannel to host-gw instead of vxlan: https://github.com/coreos/flannel/issues/1245#issuecomment-582612891
kubectl delete pods -l app=flannel -n kube-system
Seeing the same thing when trying to run kiam on 1.17, seen the issue from at least rc.2 through 1.17.3, but wasn’t sure at the time where the issue was.
Ticket I logged w/ the kiam folks: uswitch/kiam#378 Ticket I logged w/ the kops folks: kubernetes/kops#8562
As this seems networking related, I will state we are running Canal CNI, and IPVS mode of kube-proxy.
(Correction, we were on Canal, not Calico, which means the mentioned Flannel issue is likely at the root of it…)
If I do a kubectl run and it ends up on the same node: zero issues. Will try if I can adjust the pod to dns by podIP and see if that connects