kubernetes: KubeDNS not working inside of pod when its containers are on the same node with kube-dns containers.

Hi!

I’d like to report an issue. I can’t resolve DNS names inside a pod when its containers are on the same node with kube-dns containers.

I’m running a Kubernetes cluster in Vagrant (1 master node, 2 minions):

[vagrant@master0 ~]$ kubectl cluster-info
Kubernetes master is running at http://localhost:8080
KubeDNS is running at http://localhost:8080/api/v1/proxy/namespaces/kube-system/services/kube-dns

[vagrant@master0 ~]$ kubectl get nodes
NAME                               LABELS                                                    STATUS    AGE
minion0.k8s-vagrant-kbelyaev.tld   kubernetes.io/hostname=minion0.k8s-vagrant-kbelyaev.tld   Ready     20h
minion1.k8s-vagrant-kbelyaev.tld   kubernetes.io/hostname=minion1.k8s-vagrant-kbelyaev.tld   Ready     19h

[vagrant@master0 ~]$ kubectl get rc --namespace=kube-system
CONTROLLER    CONTAINER(S)   IMAGE(S)                                             SELECTOR                      REPLICAS   AGE
kube-dns-v9   etcd           gcr.io/google_containers/etcd:2.0.9                  k8s-app=kube-dns,version=v9   1          20h
              kube2sky       gcr.io/google_containers/kube2sky:1.11
              skydns         gcr.io/google_containers/skydns:2015-10-13-8c72f8c
              healthz        gcr.io/google_containers/exechealthz:1.0

[vagrant@master0 ~]$ kubectl get pods --namespace=kube-system
NAME                READY     STATUS    RESTARTS   AGE
kube-dns-v9-cfrsn   4/4       Running   0          20h

[vagrant@master0 ~]$ kubectl get pods --namespace=kube-system -o yaml | grep -i ip
    hostIP: 172.26.1.101
    podIP: 172.18.32.2

I’m using flanneld with host-gw backend for overlay network . The configuration is the same on both kubernetes nodes (minion0 and minion1):

[vagrant@master0 ~]$ etcdctl get /coreos.com/network/config
{ "Network": "172.18.0.0/16", "Backend": { "Type": "host-gw"} }

[root@minion1 vagrant]# ps aux | grep flanneld
root      4584  0.0  0.1 119900  1164 ?        Ssl  фев17   0:02 /usr/bin/flanneld -etcd-endpoints=http://172.26.1.10:4001 -etcd-prefix=/coreos.com/network -alsologtostderr=false -iface=eth1 -ip-masq=false -subnet-file=/run/flannel/subnet.env

Docker is running with following configuration on both nodes:

[vagrant@minion0 ~]$ ps aux | grep docker
root      4941  0.3  3.2 891556 33404 ?        Ssl  фев18   5:14 /usr/bin/docker daemon --selinux-enabled --bip=172.18.32.1/24 --ip-masq=true --mtu=1500 --insecure-registry=kubernetes-registry:5000 --insecure-registry=registry0:5000 --insecure-registry=registry0.k8s-vagrant-kbelyaev.tld:5000 --insecure-registry=172.26.1.11:5000

Kube-proxy is running on the “iptables proxy” mode with the same configuration on both nodes:

[root@minion1 vagrant]# cat /etc/kubernetes/proxy
###
# kubernetes proxy config

# default config should be adequate

# Add your own!
KUBE_PROXY_ARGS=" --kubeconfig=/etc/kubernetes/proxy.kubeconfig --proxy-mode=iptables"

Then I’m starting two pods:

[vagrant@master0 ~]$ kubectl run nginx0 --image=nginx
replicationcontroller "nginx0" created

[vagrant@master0 ~]$ kubectl run nginx1 --image=nginx
replicationcontroller "nginx1" created

[vagrant@master0 ~]$ kubectl get pod nginx0-ohau3 -o yaml | grep -i ip
  hostIP: 172.26.1.102
  podIP: 172.18.98.4
[vagrant@master0 ~]$ kubectl get pod nginx1-rt8w7 -o yaml | grep -i ip
  hostIP: 172.26.1.101
  podIP: 172.18.32.3

After I that I’m checking if kube-dns is working properly.

On nginx0-ohau3 pod:

[vagrant@master0 ~]$ kubectl exec nginx0-ohau3 -i -t -- bash

root@nginx0-ohau3:/# cat /etc/resolv.conf
nameserver 10.254.0.10
nameserver 10.0.2.3
search default.svc.cluster.local svc.cluster.local cluster.local machinezone.com k8s-vagrant-kbelyaev.tld. k8s-vagrant-kbelyaev.tld
options ndots:5

root@nginx0-ohau3:/# nslookup ya.ru
Server:     10.254.0.10
Address:    10.254.0.10#53

Non-authoritative answer:
Name:   ya.ru
Address: 213.180.193.3
Name:   ya.ru
Address: 93.158.134.3
Name:   ya.ru
Address: 213.180.204.3

root@nginx0-ohau3:/# nslookup kubernetes
Server:     10.254.0.10
Address:    10.254.0.10#53

Name:   kubernetes.default.svc.cluster.local
Address: 10.254.0.1

On nginx1-rt8w7 pod:

[vagrant@master0 ~]$ kubectl exec  nginx1-rt8w7 -i -t -- bash

root@nginx1-rt8w7:/# cat /etc/resolv.conf
nameserver 10.254.0.10
nameserver 10.0.2.3
search default.svc.cluster.local svc.cluster.local cluster.local machinezone.com k8s-vagrant-kbelyaev.tld. k8s-vagrant-kbelyaev.tld
options ndots:5

root@nginx1-rt8w7:/# nslookup ya.ru
;; reply from unexpected source: 172.18.32.2#53, expected 10.254.0.10#53
;; reply from unexpected source: 172.18.32.2#53, expected 10.254.0.10#53
;; reply from unexpected source: 172.18.32.2#53, expected 10.254.0.10#53
;; reply from unexpected source: 172.18.32.2#53, expected 10.254.0.10#53
;; reply from unexpected source: 172.18.32.2#53, expected 10.254.0.10#53
;; reply from unexpected source: 172.18.32.2#53, expected 10.254.0.10#53
;; reply from unexpected source: 172.18.32.2#53, expected 10.254.0.10#53
Server:     10.0.2.3
Address:    10.0.2.3#53

Non-authoritative answer:
Name:   ya.ru
Address: 213.180.193.3
Name:   ya.ru
Address: 213.180.204.3
Name:   ya.ru
Address: 93.158.134.3

root@nginx1-rt8w7:/# nslookup kubernetes
;; reply from unexpected source: 172.18.32.2#53, expected 10.254.0.10#53
;; reply from unexpected source: 172.18.32.2#53, expected 10.254.0.10#53
;; reply from unexpected source: 172.18.32.2#53, expected 10.254.0.10#53
;; reply from unexpected source: 172.18.32.2#53, expected 10.254.0.10#53
;; reply from unexpected source: 172.18.32.2#53, expected 10.254.0.10#53
;; reply from unexpected source: 172.18.32.2#53, expected 10.254.0.10#53
;; reply from unexpected source: 172.18.32.2#53, expected 10.254.0.10#53
Server:     10.0.2.3
Address:    10.0.2.3#53

** server can't find kubernetes: NXDOMAIN

As you can see, DNS is working on one pod and doesn’t on the other.

I’ve started an investigation and that’s what I found. The only difference between these two pods is that they are running on different kubernetes nodes. The pod with the DNS issue nginx1-rt8w7 is on the Kubernetes node with kube-dns-v9-cfrsn pod.

Then I look further. Here is iptables for minion0 - 172.26.1.101

# Generated by iptables-save v1.4.21 on Thu Feb 18 15:03:39 2016
*nat
:PREROUTING ACCEPT [21:1711]
:INPUT ACCEPT [1:44]
:OUTPUT ACCEPT [5:300]
:POSTROUTING ACCEPT [5:300]
:DOCKER - [0:0]
:KUBE-NODEPORTS - [0:0]
:KUBE-SEP-IZ67SGGWIVQQKMYU - [0:0]
:KUBE-SEP-K6RKWMJPU7MWWPPT - [0:0]
:KUBE-SEP-PD5DZXVECUQRVBZ3 - [0:0]
:KUBE-SERVICES - [0:0]
:KUBE-SVC-6N4SJQIF3IX3FORG - [0:0]
:KUBE-SVC-ERIFXISQEP7F7OF4 - [0:0]
:KUBE-SVC-TCOU7JCQXEZGVUNU - [0:0]
-A PREROUTING -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A PREROUTING -m addrtype --dst-type LOCAL -j DOCKER
-A OUTPUT -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A OUTPUT ! -d 127.0.0.0/8 -m addrtype --dst-type LOCAL -j DOCKER
-A POSTROUTING -s 172.18.32.0/24 ! -o docker0 -j MASQUERADE
-A POSTROUTING -m comment --comment "kubernetes service traffic requiring SNAT" -m mark --mark 0x4d415351 -j MASQUERADE
-A KUBE-SEP-IZ67SGGWIVQQKMYU -s 172.18.32.2/32 -m comment --comment "kube-system/kube-dns:dns" -j MARK --set-xmark 0x4d415351/0xffffffff
-A KUBE-SEP-IZ67SGGWIVQQKMYU -p udp -m comment --comment "kube-system/kube-dns:dns" -m udp -j DNAT --to-destination 172.18.32.2:53
-A KUBE-SEP-K6RKWMJPU7MWWPPT -s 172.18.32.2/32 -m comment --comment "kube-system/kube-dns:dns-tcp" -j MARK --set-xmark 0x4d415351/0xffffffff
-A KUBE-SEP-K6RKWMJPU7MWWPPT -p tcp -m comment --comment "kube-system/kube-dns:dns-tcp" -m tcp -j DNAT --to-destination 172.18.32.2:53
-A KUBE-SEP-PD5DZXVECUQRVBZ3 -s 172.26.1.10/32 -m comment --comment "default/kubernetes:" -j MARK --set-xmark 0x4d415351/0xffffffff
-A KUBE-SEP-PD5DZXVECUQRVBZ3 -p tcp -m comment --comment "default/kubernetes:" -m tcp -j DNAT --to-destination 172.26.1.10:443
-A KUBE-SERVICES -d 10.254.0.1/32 -p tcp -m comment --comment "default/kubernetes: cluster IP" -m tcp --dport 443 -j KUBE-SVC-6N4SJQIF3IX3FORG
-A KUBE-SERVICES -d 10.254.0.10/32 -p udp -m comment --comment "kube-system/kube-dns:dns cluster IP" -m udp --dport 53 -j KUBE-SVC-TCOU7JCQXEZGVUNU
-A KUBE-SERVICES -d 10.254.0.10/32 -p tcp -m comment --comment "kube-system/kube-dns:dns-tcp cluster IP" -m tcp --dport 53 -j KUBE-SVC-ERIFXISQEP7F7OF4
-A KUBE-SERVICES -m comment --comment "kubernetes service nodeports; NOTE: this must be the last rule in this chain" -m addrtype --dst-type LOCAL -j KUBE-NODEPORTS
-A KUBE-SVC-6N4SJQIF3IX3FORG -m comment --comment "default/kubernetes:" -j KUBE-SEP-PD5DZXVECUQRVBZ3
-A KUBE-SVC-ERIFXISQEP7F7OF4 -m comment --comment "kube-system/kube-dns:dns-tcp" -j KUBE-SEP-K6RKWMJPU7MWWPPT
-A KUBE-SVC-TCOU7JCQXEZGVUNU -m comment --comment "kube-system/kube-dns:dns" -j KUBE-SEP-IZ67SGGWIVQQKMYU
COMMIT
# Completed on Thu Feb 18 15:03:39 2016
# Generated by iptables-save v1.4.21 on Thu Feb 18 15:03:39 2016
*filter
:INPUT ACCEPT [371116:338884989]
:FORWARD ACCEPT [169:13474]
:OUTPUT ACCEPT [425763:52679907]
:DOCKER - [0:0]
-A FORWARD -o docker0 -j DOCKER
-A FORWARD -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -i docker0 ! -o docker0 -j ACCEPT
-A FORWARD -i docker0 -o docker0 -j ACCEPT
COMMIT
# Completed on Thu Feb 18 15:03:39 2016

Here is what happens when I’m making DNS request inside nginx1-rt8w7 :

#IPTABLES TRACE
[root@minion0 vagrant]# tail -f /var/log/messages | grep TRACE
Feb 19 11:26:59 localhost kernel: TRACE: raw:PREROUTING:policy:3 IN=docker0 OUT= MAC=02:42:73:f4:a9:56:02:42:ac:12:20:03:08:00 SRC=172.18.32.3 DST=10.254.0.10 LEN=82 TOS=0x00 PREC=0x00 TTL=64 ID=59396 DF PROTO=UDP SPT=44021 DPT=53 LEN=62
Feb 19 11:26:59 localhost kernel: TRACE: nat:PREROUTING:rule:1 IN=docker0 OUT= MAC=02:42:73:f4:a9:56:02:42:ac:12:20:03:08:00 SRC=172.18.32.3 DST=10.254.0.10 LEN=82 TOS=0x00 PREC=0x00 TTL=64 ID=59396 DF PROTO=UDP SPT=44021 DPT=53 LEN=62
Feb 19 11:26:59 localhost kernel: TRACE: nat:KUBE-SERVICES:rule:2 IN=docker0 OUT= MAC=02:42:73:f4:a9:56:02:42:ac:12:20:03:08:00 SRC=172.18.32.3 DST=10.254.0.10 LEN=82 TOS=0x00 PREC=0x00 TTL=64 ID=59396 DF PROTO=UDP SPT=44021 DPT=53 LEN=62
Feb 19 11:26:59 localhost kernel: TRACE: nat:KUBE-SVC-TCOU7JCQXEZGVUNU:rule:1 IN=docker0 OUT= MAC=02:42:73:f4:a9:56:02:42:ac:12:20:03:08:00 SRC=172.18.32.3 DST=10.254.0.10 LEN=82 TOS=0x00 PREC=0x00 TTL=64 ID=59396 DF PROTO=UDP SPT=44021 DPT=53 LEN=62
Feb 19 11:26:59 localhost kernel: TRACE: nat:KUBE-SEP-IZ67SGGWIVQQKMYU:rule:2 IN=docker0 OUT= MAC=02:42:73:f4:a9:56:02:42:ac:12:20:03:08:00 SRC=172.18.32.3 DST=10.254.0.10 LEN=82 TOS=0x00 PREC=0x00 TTL=64 ID=59396 DF PROTO=UDP SPT=44021 DPT=53 LEN=62
Feb 19 11:26:59 localhost kernel: TRACE: filter:FORWARD:rule:1 IN=docker0 OUT=docker0 MAC=02:42:73:f4:a9:56:02:42:ac:12:20:03:08:00 SRC=172.18.32.3 DST=172.18.32.2 LEN=82 TOS=0x00 PREC=0x00 TTL=63 ID=59396 DF PROTO=UDP SPT=44021 DPT=53 LEN=62
Feb 19 11:26:59 localhost kernel: TRACE: filter:DOCKER:return:1 IN=docker0 OUT=docker0 MAC=02:42:73:f4:a9:56:02:42:ac:12:20:03:08:00 SRC=172.18.32.3 DST=172.18.32.2 LEN=82 TOS=0x00 PREC=0x00 TTL=63 ID=59396 DF PROTO=UDP SPT=44021 DPT=53 LEN=62
Feb 19 11:26:59 localhost kernel: TRACE: filter:FORWARD:rule:4 IN=docker0 OUT=docker0 MAC=02:42:73:f4:a9:56:02:42:ac:12:20:03:08:00 SRC=172.18.32.3 DST=172.18.32.2 LEN=82 TOS=0x00 PREC=0x00 TTL=63 ID=59396 DF PROTO=UDP SPT=44021 DPT=53 LEN=62
Feb 19 11:26:59 localhost kernel: TRACE: nat:POSTROUTING:policy:3 IN= OUT=docker0 SRC=172.18.32.3 DST=172.18.32.2 LEN=82 TOS=0x00 PREC=0x00 TTL=63 ID=59396 DF PROTO=UDP SPT=44021 DPT=53 LEN=62

#TCPDUMP
[root@minion0 vagrant]# tcpdump -i docker0 -vvv -s 0 -l -n port 53 | grep kubernetes
tcpdump: listening on docker0, link-type EN10MB (Ethernet), capture size 65535 bytes
    172.18.32.3.44021 > 10.254.0.10.domain: [bad udp cksum 0xd76c -> 0xf581!] 50370+ A? kubernetes.default.svc.cluster.local. (54)
    172.18.32.3.44021 > 172.18.32.2.domain: [bad udp cksum 0x9879 -> 0x3475!] 50370+ A? kubernetes.default.svc.cluster.local. (54)
    172.18.32.2.domain > 172.18.32.3.44021: [bad udp cksum 0x9889 -> 0xe4a3!] 50370* q: A? kubernetes.default.svc.cluster.local. 1/0/0 kubernetes.default.svc.cluster.local. [30s] A 10.254.0.1 (70)

As you can see the response from the DNS server is not DNAT’ed back to set src to 10.254.0.10. So the response packet is going directly via docker0 bridge and not getting into iptables, preserving pod’s IP. To fix these I added this rule in iptables:

iptables -t nat -I POSTROUTING -s 172.18.32.0/24 -d 172.18.32.0/24 -j MASQUERADE

Thus all packets which is going between containers on one Kubernetes node are masqueraded.

But I’m not sure that this fix is a good decision. What would you suggest? Is it bug in Kubernetes or maybe I misconfigured Kubernetes?

I’m sure that this problem conserns all cases when we request Kubernetes Service inside a pod which containers are on the same node with pod containers Kubernetes Service is proxying to.

Thanks, Konstantin

About this issue

Original URL
State: closed
Created 8 years ago
Reactions: 18
Comments: 25 (8 by maintainers)

Commits related to this issue

[IMP]Issues https://github.com/kubernetes/kubernetes/issues/21613 — committed to TimeBye/kubeadm-ansible by deleted user 6 years ago
Merge pull request #2 from TimeBye/master [IMP]Fix Issues: https://github.com/kubernetes/kubernetes/issues/21613 — committed to choerodon/kubeadm-ansible by vinkdong 6 years ago
Fix DNS issue: - add br_netfilter module to egt rid of "reply from unexpected source" (https://github.com/kubernetes/kubernetes/issues/21613#issuecomment-343190401) - forward external request to the ... — committed to outscale-mdr/exercising-k8s-the-hardway by outscale-mdr 3 years ago
Fix DNS issue: - add br_netfilter module to egt rid of "reply from unexpected source" (https://github.com/kubernetes/kubernetes/issues/21613#issuecomment-343190401) - forward external request to the ... — committed to outscale/exercising-k8s-the-hardway by outscale-mdr 3 years ago

Most upvoted comments

For CentOS, I fixed this issue using: echo ‘1’ > /proc/sys/net/bridge/bridge-nf-call-iptables

The problem is that I did it before, it was working and for some reason it changed back to 0 again after a while. I will have to keep monitoring.

+49

nelsonfassis on Feb 7, 2018

I’m run running kubernetes 1.8.0 on ubuntu16 to get rid of “reply from unexpected source” error you have to : modprobe br_netfilter http://ebtables.netfilter.org/documentation/bridge-nf.html

+25

vladislavPV on Nov 9, 2017

There are some sysctls you need to make sure are set. Currently they are set up in the default mode by the “kubenet” wrapper but may be needed for flannel…

“net/bridge/bridge-nf-call-iptables” is one. The other is the more general “hairpin mode”. Or you may be missing a MASQUERADE rule needed by flannel?

On Fri, Feb 3, 2017 at 6:41 PM, laverite notifications@github.com wrote:

Hi @thockin https://github.com/thockin - I tested connectivity between various pods both inter and intra host and that all seems fine when using direct PodIPs. I think what I am seeing is actually a more general problem with routing via ServiceIPs- when a request to a ServiceIP which fronts a pod that is on the same host as the originating request, things fail. If I make the same request to the underlying PodIP directly (bypassing the ServiceIP), then things work (this seems to happen with other ServiceIPs as well such as with kube-dashboard). TRACE logs seem to indicate the DNAT iptable rule matches when testing, but it doesn’t actually seem to forward to the pod after that. I am using flannel on baremetal (kernel 4.9.3) with k8s v1.5.2.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kubernetes/kubernetes/issues/21613#issuecomment-277412638, or mute the thread https://github.com/notifications/unsubscribe-auth/AFVgVAkfgJeUQ9fhfdW67FiHGXxTlNNbks5rY-VwgaJpZM4HekKw .

thockin on Mar 20, 2017

i’m also having this issue on bare-metal with v1.4.7 i see users enabling --masquerade-all, but i also see @thockin saying that should never be needed so i’m not clear if that’s an appropriate resolution or not. is it?

[Update] setting masquerade-all did not work for me for v1.4.7. Anyone else still having this problem?

laverite on Jan 19, 2017

I’m having this exact issue, but my kubernetes version is 1.2.3.

How do I set options for kube-proxy? I tried doing the following: kubectl exec {kube-proxy-pod} --namespace=kube-system -- kube-proxy --masquerade-all=true

But that appeared to give a bunch of errors:

W0504 19:10:14.323373    8567 server.go:170] Neither --kubeconfig nor --master was specified.  Using default API client.  This might not work.
E0504 19:10:14.325017    8567 server.go:340] Can't get Node "gke-cluster-2-default-pool-21a94b68-lzr4", assuming iptables proxy: Get http://localhost:8080/api/v1/nodes/gke-cluster-2-default-pool-21a94b68-lzr4: dial tcp 127.0.0.1:8080: connection refused
I0504 19:10:14.326030    8567 server.go:200] Using iptables Proxier.
I0504 19:10:14.326091    8567 proxier.go:208] missing br-netfilter module or unset br-nf-call-iptables; proxy may not work as intended
I0504 19:10:14.326115    8567 server.go:213] Tearing down userspace rules.
I0504 19:10:14.336758    8567 conntrack.go:36] Setting nf_conntrack_max to 262144
I0504 19:10:14.336807    8567 conntrack.go:41] Setting conntrack hashsize to 65536
I0504 19:10:14.339752    8567 conntrack.go:46] Setting nf_conntrack_tcp_timeout_established to 86400
E0504 19:10:14.339860    8567 server.go:293] Starting health server failed: listen tcp 127.0.0.1:10249: bind: address already in use
E0504 19:10:14.341020    8567 event.go:202] Unable to write event: 'Post http://localhost:8080/api/v1/namespaces/default/events: dial tcp 127.0.0.1:8080: connection refused' (may retry after sleeping)
E0504 19:10:19.340153    8567 server.go:293] Starting health server failed: listen tcp 127.0.0.1:10249: bind: address already in use
E0504 19:10:21.990312    8567 event.go:202] Unable to write event: 'Post http://localhost:8080/api/v1/namespaces/default/events: dial tcp 127.0.0.1:8080: connection refused' (may retry after sleeping)
E0504 19:10:24.340418    8567 server.go:293] Starting health server failed: listen tcp 127.0.0.1:10249: bind: address already in use
E0504 19:10:29.340703    8567 server.go:293] Starting health server failed: listen tcp 127.0.0.1:10249: bind: address already in use

zacharynevin on May 4, 2016

@nelsonfassis Does it stick if you put it into /etc/sysctl.conf ?

onitake on Nov 13, 2018