coredns: TC Flag not set for Externalname kind service when the payload is large
As explained in this article https://easoncao.com/coredns-resolution-truncation-issue-on-kubernetes-kube-dns/
Ideal behavior - the response should contain not just cname but also all ip’s that belongs to the cname. If the server(coredns) is not able to sandwich all the ip’s due to payload limit of 512, it should send partial payload and with Truncate flag set, once this partial truncate response was received by the client, the client will use tcp as a mode to send same dns query again,
Since coredns failed to set the truncate flag and since the “A” authoritative flag was set, clients neither upgrading the request to tcp nor utilizing recursive strategy to send follow up “A” query to cname. The reason behind setting “A” could be acceptable due to the fact that coredns has dual responsibility (as a Authoritative domain for local kubernetes related services, as a recursive resolver for external internet domains)
The only concern here is a coredns just responding with cname without even setting truncate flag.
Tested Potential workarounds:
- Forcefully make the clients(pods) to use just tcp mode when interacting with coredns pods. This could be simply configured with the help of use-vc flag like below
template:
metadata:
labels:
app: customer-dns
spec:
dnsConfig:
options:
- name: use-vc
containers:
- name: customer-dns
image: nginx:latest
ports:
- containerPort: 8080
resources:
limits:
cpu: 300m
memory: 500Mi
requests:
cpu: 200m
memory: 500Mi
- As a win win situation, the customer could use nodelocaldns cache, as the communication from nodelocaldns cache to the coredns pod will be by default tcp, and the communication between pod to the nodelocaldns pod can either be tcp or udp.
As per my testing, implementing nodelocaldns cache solved the issue as now the issue is being masked since nodelocalcache dns implementation didnt act in that unexpected way.
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 28 (21 by maintainers)
@TheRealGoku, another simpler workaround would be to add
force_tcp
to the forward plugin (should work for the same reason node-local-dns works, the backend resolutions is done over tcp).FYI, I believe I’ve discovered the root cause of the inconsistent TC bit I mentioned above. In a nutshell, when we do an upstream lookup if that result was truncated, the answer to the client needs to have the tc bit set. It’s an easy fix for template, less easy for backend users (kubernetes and etcd), and kind of a horrible nightmare for file and friends. The inconsistency stems from the whether to not the total response of CNAME + truncated A records exceeds the max length. In some cases it will, in which case the response gets truncated, and marked truncated. In other cases it will not, in which case the response is not truncated, and not marked truncated (however the set of A records is in fact truncated).
I’m working on a fix and should open a PR early next week.
I could confirm that its the same behavior for ExternalName in CoreDNS-1.8.3 on EKS 1.20, The issue still persist. I could also conclude that this issue is not happening for direct queries to a domain which has larger reply payload but only for ExternalName type service
Environment Kubernetes version - EKS 1.20 CoreDns version - CoreDNS-1.8.3
Steps:
To make sure we have control of less and more dns payload , I have created a private hosted zone with entries like below and attached it to the same VPC of EKS cluster
Now create a ExternalName type services referring above names, in real life scenario this could be elastically scaled redis cluster domains. kubectl apply -f lesspayload.yaml kubectl apply -f morepayload.yaml kubectl apply -f nginx-deployment.yaml
File - lesspayload.yaml
File - morepayload.yaml
File - nginx-deployment.yaml
For - lesspayload.default.svc.cluster.local.
For - curl -v morepayload.default.svc.cluster.local.
Attachment: https://github.com/TheRealGoku/coredns-tc-behavior
coredns-behavior-non_working.pcap - This capture is taken in parallel on client pod side when the curl request “lesspayload.default.svc.cluster.local.” and “morepayload.default.svc.cluster.local.” are made
coredns-behavior-working.pcap - This capture is taken in parallel on client pod side when the curl request “lesspayload.playwithtc.com” and “morepayload.playwithtc.com” are made
As per the above two pcaps, its evident that only in ExternalName scenario with high payload the CoreDNS is not setting TC flag and sending just CNAME in response without any IP response
In other scenario(coredns-behavior-working.pcap) when querying the direct domain which has heavy payload, its setting the TC flag as expected thereby client is making connection over TCP.