coredns: Data is not synced when CoreDNS reconnects to kubernetes api server after protracted disconnection

What happened: For some reason, our project needs to deploy coreDNS away from k8s apiserver , at the same time the network is not reliable. Therefore, we tested whether the event of k8s SVC when the network was disconnected would be detected by CoreDNS after reconnetcion. The result showed that when SVC was deleted during a long period of disconnection from k8s API server, and CoreDNS still had the parsing record of the SVC after reconnection. What you expected to happen: When CoreDNS is disconnected from API server, SVC is deleted. After reconnection, CoreDNS should not have the parsing record of this SVC How to reproduce it (as minimally and precisely as possible): Virtural Machine a : run kubernetes api server Virtural Machine b : run CoreDNS by binary+systemed

Short time network failure

In a, use kubectl to create a svc relating existing pod
In b, use dig to resolve svc dig svc.namespace.service.cluster.local @debianIP -p 1053, successfully return svc vip
In b, shut down the network of b
In a, use kubectl delete svc
In a, use kubectl create svc2
In b, resume the network of b after 1 min
In b, use dig to resolve svc , successfully return none
Inb, use dig to resolve svc2, successfully return svc2 vip

Long time network failure

In a, use kubectl to create a svc
In b, use dig to resolve svc, successfully return svc vip
In b, shut down the network of b
In a, use kubectl delete svc
In a, use kubectl create svc2
In b, resume the network of b after 30 min
In b, use dig to resolve svc , still return svc vip
In b, use dig to resolve svc2, successfully return svc2 vip Anything else we need to know?: Could this be due to a design flaw in the ListWatch itself?

Environment:

the version of CoreDNS:
Corefile:

.:1053 {
        errors
        health {
           lameduck 5s
        }
        ready
        kubernetes cluster.local in-addr.arpa ip6.arpa {
           endpoint http://10.10.103.98:8080
           pods insecure
           fallthrough in-addr.arpa ip6.arpa
           ttl 0
        }
        forward . /etc/resolv.conf
        loop
        reload
        loadbalance
    }

logs, if applicable:
OS (e.g: cat /etc/os-release):
Others:

About this issue

Original URL
State: closed
Created 4 years ago
Comments: 32 (23 by maintainers)

Most upvoted comments

@GsssC, I ran through the test above, and was unable to reproduce the described results. In short, I did the following with an instance of CoreDNS running locally on my machine connecting to a K8s cluster in a virtual machine:

Create service “service1”. Validate the service can be queried by name via dns.
Break CoreDNS’s connection to the API. Errors immediately begin repeating in CoreDNS log.
delete “service1”
wait 30 minutes
Restore CoreDNS’s connection to the API. Errors immediately cease in CoreDNS log.
Query for “service1”, got NXDOMAIN as expected

I did this twice, with the same result. Is your failure scenario consistently reproducible in your environment? What version of CoreDNS are you using? And what version of Kubernetes?

chrisohaver on May 8, 2020