coredns: plugin/rewrite: queries fail in Rails

When I’m rewriting certain entries inside kubernetes cluster, DNS resolution in Rails app starts failing like follows:

irb(main):012:0> Resolv::DNS.new().getaddress("staging.mydomain.com")
Resolv::ResolvError: DNS result has no information for staging.mydomain.com
	from /usr/local/lib/ruby/2.3.0/resolv.rb:386:in `getaddress'
	from (irb):12
	from /usr/local/bundle/gems/railties-4.2.10/lib/rails/commands/console.rb:110:in `start'
	from /usr/local/bundle/gems/railties-4.2.10/lib/rails/commands/console.rb:9:in `start'
	from /usr/local/bundle/gems/railties-4.2.10/lib/rails/commands/commands_tasks.rb:68:in `console'
	from /usr/local/bundle/gems/railties-4.2.10/lib/rails/commands/commands_tasks.rb:39:in `run_command!'
	from /usr/local/bundle/gems/railties-4.2.10/lib/rails/commands.rb:17:in `<top (required)>'
	from bin/rails:9:in `require'
	from bin/rails:9:in `<main>'

Corefile:

.:53 {
        errors
        health
        log
        autopath @kubernetes
        rewrite {
            name regex staging.mydomain.com aws-loadbalancer-id.us-east-1.elb.amazonaws.com
            answer name aws-loadbalancer-id.us-east-1.elb.amazonaws.com staging.mydomain.com
        }
        kubernetes cluster.local in-addr.arpa ip6.arpa {
          pods insecure
          upstream
          fallthrough in-addr.arpa ip6.arpa
        }
        prometheus :9153
        proxy . /etc/resolv.conf
        cache 30
        reload
        loadbalance
    }

Versions:

/app # rails -v
Rails 4.2.10
/app # ruby -v
ruby 2.3.4p301 (2017-03-30 revision 58214) [x86_64-linux-musl]

Container is based on Alpine Linux v3.4 At thre same time dig produces completely expected output:

; <<>> DiG 9.11.3 <<>> staging.mydomain.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 2946
;; flags: qr rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;staging.mydomain.com.		IN	A

;; ANSWER SECTION:
staging.mydomain.com.	6	IN	A	52.202.38.144
staging.mydomain.com.	6	IN	A	54.236.98.217
staging.mydomain.com.	6	IN	A	34.199.165.122

;; Query time: 1 msec
;; SERVER: 100.64.0.10#53(100.64.0.10)
;; WHEN: Tue Aug 21 17:08:12 UTC 2018
;; MSG SIZE  rcvd: 161

@greenpau @johnbelamaric

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 52 (9 by maintainers)

Commits related to this issue

Most upvoted comments

That’s an interesting question and you have pointed me out in the right direction. I didn’t use any overlay networks and simply relied on kops to bootstrap the cluster and set up networking and routes appropriately. That was not the case. What made me think that this is CoreDNS problem is that kube-dns still worked with my 1-node setup, probably because it was running in the pods on the same node, while CoreDNS pods are running on the master. Now when I added an overlay network everything magically started to work. @greenpau @johnbelamaric thanks for your time and sorry for disturbing you.

iptables-save is even better, it gets all the tables.

On Fri, Sep 21, 2018 at 11:45 AM Paul G. notifications@github.com wrote:

I can see them on the host, but not in CoreDNS container.

@b0ric https://github.com/b0ric , the next step is capturing the following commands on the host:

  • iptables -L -n
  • iptables -L -n -t nat
  • iptables -L -n -t mangle

I would say it is not a CoreDNS issue …

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/coredns/coredns/issues/2041#issuecomment-423635358, or mute the thread https://github.com/notifications/unsubscribe-auth/AJB4sy7Ahx76amAN58Fpzkj3vI51O1gcks5udTPLgaJpZM4WGNA9 .

hi guys, and I guess I’m facing the same or very similar issue to @osnagovskyi

I noticed that DNS query packets get lost (though sometimes they are randomly proxied to AWS DNS) and here’s what I’ve found out

$ ruby --version
ruby 2.5.0p0 (2017-12-25 revision 61468) [x86_64-linux]

inside the container

$ cat /etc/resolv.conf 
nameserver 100.64.0.10
options ndots:1
$ dig udemy.com @100.64.0.10 

; <<>> DiG 9.10.3-P4-Debian <<>> udemy.com @100.64.0.10
;; global options: +cmd
;; connection timed out; no servers could be reached

To look at the link-level communication I ran tcpdump on the node (this output corresponds to the dig command above):

$ tcpdump -vvv -X -n "dst port 53 or src host 100.96.0.4 or src port 53"

18:23:10.236729 IP (tos 0x0, ttl 63, id 20263, offset 0, flags [none], proto UDP (17), length 66)
    100.96.1.24.47255 > 100.96.0.4.53: [bad udp cksum 0xca1b -> 0xe395!] 15332+ [1au] A? udemy.com. ar: . OPT UDPsize=4096 (38)
	0x0000:  4500 0042 4f27 0000 3f11 62a8 6460 0118  E..BO'..?.b.d`..
	0x0010:  6460 0004 b897 0035 002e ca1b 3be4 0120  d`.....5....;...
	0x0020:  0001 0000 0000 0001 0575 6465 6d79 0363  .........udemy.c
	0x0030:  6f6d 0000 0100 0100 0029 1000 0000 0000  om.......)......
	0x0040:  0000                                     ..
18:23:15.236828 IP (tos 0x0, ttl 63, id 21239, offset 0, flags [none], proto UDP (17), length 66)
    100.96.1.24.47255 > 100.96.0.4.53: [bad udp cksum 0xca1b -> 0xe395!] 15332+ [1au] A? udemy.com. ar: . OPT UDPsize=4096 (38)
	0x0000:  4500 0042 52f7 0000 3f11 5ed8 6460 0118  E..BR...?.^.d`..
	0x0010:  6460 0004 b897 0035 002e ca1b 3be4 0120  d`.....5....;...
	0x0020:  0001 0000 0000 0001 0575 6465 6d79 0363  .........udemy.c
	0x0030:  6f6d 0000 0100 0100 0029 1000 0000 0000  om.......)......
	0x0040:  0000                                     ..
18:23:20.237046 IP (tos 0x0, ttl 63, id 22381, offset 0, flags [none], proto UDP (17), length 66)
    100.96.1.24.47255 > 100.96.0.4.53: [bad udp cksum 0xca1b -> 0xe395!] 15332+ [1au] A? udemy.com. ar: . OPT UDPsize=4096 (38)
	0x0000:  4500 0042 576d 0000 3f11 5a62 6460 0118  E..BWm..?.Zbd`..
	0x0010:  6460 0004 b897 0035 002e ca1b 3be4 0120  d`.....5....;...
	0x0020:  0001 0000 0000 0001 0575 6465 6d79 0363  .........udemy.c
	0x0030:  6f6d 0000 0100 0100 0029 1000 0000 0000  om.......)......
	0x0040:  0000                                     ..

So, as you see I didn’t get the response from CoreDNS at all. Needless to say there were no entries in CoreDNS log in its container. And on the other hand when I send requests directly to AWS DNS (my cluster is on AWS) here’s the response I get:

$ dig udemy.com @10.233.0.2  

; <<>> DiG 9.10.3-P4-Debian <<>> udemy.com @10.233.0.2
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 55875
;; flags: qr rd ra; QUERY: 1, ANSWER: 5, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;udemy.com.			IN	A

;; ANSWER SECTION:
udemy.com.		60	IN	A	104.18.132.108
udemy.com.		60	IN	A	104.18.133.108
udemy.com.		60	IN	A	104.18.134.108
udemy.com.		60	IN	A	104.18.130.108
udemy.com.		60	IN	A	104.18.131.108

;; Query time: 15 msec
;; SERVER: 10.233.0.2#53(10.233.0.2)
;; WHEN: Wed Sep 19 18:53:02 UTC 2018
;; MSG SIZE  rcvd: 118
$ tcpdump -vvv -X -n "dst port 53 or src host 10.233.0.2 or src port 53"
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
18:53:02.202831 IP (tos 0x0, ttl 63, id 43746, offset 0, flags [none], proto UDP (17), length 66)
    10.233.104.150.42635 > 10.233.0.2.53: [bad udp cksum 0x7ea9 -> 0xa2b4!] 55875+ [1au] A? udemy.com. ar: . OPT UDPsize=4096 (38)
	0x0000:  4500 0042 aae2 0000 3f11 525f 0ae9 6896  E..B....?.R_..h.
	0x0010:  0ae9 0002 a68b 0035 002e 7ea9 da43 0120  .......5..~..C..
	0x0020:  0001 0000 0000 0001 0575 6465 6d79 0363  .........udemy.c
	0x0030:  6f6d 0000 0100 0100 0029 1000 0000 0000  om.......)......
	0x0040:  0000                                     ..
18:53:02.217990 IP (tos 0x0, ttl 255, id 42425, offset 0, flags [none], proto UDP (17), length 146)
    10.233.0.2.53 > 10.233.104.150.42635: [udp sum ok] 55875 q: A? udemy.com. 5/0/1 udemy.com. [1m] A 104.18.132.108, udemy.com. [1m] A 104.18.133.108, udemy.com. [1m] A 104.18.134.108, udemy.com. [1m] A 104.18.130.108, udemy.com. [1m] A 104.18.131.108 ar: . OPT UDPsize=4096 (118)
	0x0000:  4500 0092 a5b9 0000 ff11 9737 0ae9 0002  E..........7....
	0x0010:  0ae9 6896 0035 a68b 007e 1d4f da43 8180  ..h..5...~.O.C..
	0x0020:  0001 0005 0000 0001 0575 6465 6d79 0363  .........udemy.c
	0x0030:  6f6d 0000 0100 01c0 0c00 0100 0100 0000  om..............
	0x0040:  3c00 0468 1284 6cc0 0c00 0100 0100 0000  <..h..l.........
	0x0050:  3c00 0468 1285 6cc0 0c00 0100 0100 0000  <..h..l.........
	0x0060:  3c00 0468 1286 6cc0 0c00 0100 0100 0000  <..h..l.........
	0x0070:  3c00 0468 1282 6cc0 0c00 0100 0100 0000  <..h..l.........
	0x0080:  3c00 0468 1283 6c00 0029 1000 0000 0000  <..h..l..)......
	0x0090:  0000                                     ..

Also, here’s what my CoreDNS configmap like:

# Please edit the object below. Lines beginning with a '#' will be ignored,
# and an empty file will abort the edit. If an error occurs while saving this file will be
# reopened with the relevant failures.
#
apiVersion: v1
data:
  Corefile: |
    .:53 {
        log
        errors
        health
        kubernetes cluster.local in-addr.arpa ip6.arpa {
          pods insecure
          upstream
          fallthrough in-addr.arpa ip6.arpa
        }
        prometheus :9153
        proxy . /etc/resolv.conf
        cache 30
        reload
        loadbalance
    }
kind: ConfigMap
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"v1","data":{"Corefile":".:53 {\n    errors\n    log\n    health\n    kubernetes cluster.local. in-addr.arpa ip6.arpa {\n      pods insecure\n      upstream\n      fallthrough in-addr.arpa ip6.arpa\n    }\n    prometheus :9153\n $
  creationTimestamp: 2018-09-18T17:15:41Z
  labels:
    addonmanager.kubernetes.io/mode: EnsureExists
  name: coredns
  namespace: kube-system
  resourceVersion: "110118"
  selfLink: /api/v1/namespaces/kube-system/configmaps/coredns
  uid: 788ac487-bb66-11e8-9a12-02dae17e26ba

[ Quoting notifications@github.com in “Re: [coredns/coredns] plugin/rewrit…” ]

@miekg, I am still working this through. During a rewrite, the name is set via rr.Header().Name (or soming like that … typing on my phone 😄 ). I think we would need adding size recalc in the rewrite’s reverter functions.

This shouldn’t matter, and if it does it’s a bug in miekg/dns (I think - should create some time to have a proper look)

@greenpau, he changed the config.

correct, I removed answer name rewriting …

i.e. he removed the answer name re-write when doing the tcpdump. The initial tests were done with the answer name re-write, as described in the original issue description.

If you revisit that again, you can see that the answer is being re-written, per dig output. If it was not working, the names in the response would have been aws-loadbalancer-id.us-east-1.elb.amazonaws.com. Yet ruby borks on the response.