coredns: Data is not synced when CoreDNS reconnects to kubernetes api server after protracted disconnection
What happened: For some reason, our project needs to deploy coreDNS away from k8s apiserver , at the same time the network is not reliable. Therefore, we tested whether the event of k8s SVC when the network was disconnected would be detected by CoreDNS after reconnetcion. The result showed that when SVC was deleted during a long period of disconnection from k8s API server, and CoreDNS still had the parsing record of the SVC after reconnection. What you expected to happen: When CoreDNS is disconnected from API server, SVC is deleted. After reconnection, CoreDNS should not have the parsing record of this SVC How to reproduce it (as minimally and precisely as possible): Virtural Machine a : run kubernetes api server Virtural Machine b : run CoreDNS by binary+systemed
- Short time network failure
- In a, use kubectl to create a svc relating existing pod
- In b, use dig to resolve svc
dig svc.namespace.service.cluster.local @debianIP -p 1053
, successfully return svc vip - In b, shut down the network of b
- In a, use kubectl delete svc
- In a, use kubectl create svc2
- In b, resume the network of b after 1 min
- In b, use dig to resolve svc , successfully return none
- Inb, use dig to resolve svc2, successfully return svc2 vip
- Long time network failure
- In a, use kubectl to create a svc
- In b, use dig to resolve svc, successfully return svc vip
- In b, shut down the network of b
- In a, use kubectl delete svc
- In a, use kubectl create svc2
- In b, resume the network of b after 30 min
- In b, use dig to resolve svc , still return svc vip
- In b, use dig to resolve svc2, successfully return svc2 vip Anything else we need to know?: Could this be due to a design flaw in the ListWatch itself?
Environment:
- the version of CoreDNS:
- Corefile:
.:1053 {
errors
health {
lameduck 5s
}
ready
kubernetes cluster.local in-addr.arpa ip6.arpa {
endpoint http://10.10.103.98:8080
pods insecure
fallthrough in-addr.arpa ip6.arpa
ttl 0
}
forward . /etc/resolv.conf
loop
reload
loadbalance
}
- logs, if applicable:
- OS (e.g:
cat /etc/os-release
): - Others:
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 32 (23 by maintainers)
@GsssC, I ran through the test above, and was unable to reproduce the described results. In short, I did the following with an instance of CoreDNS running locally on my machine connecting to a K8s cluster in a virtual machine:
I did this twice, with the same result. Is your failure scenario consistently reproducible in your environment? What version of CoreDNS are you using? And what version of Kubernetes?