kubernetes: Kubernetes DNS Specification does not recognize EndpointSlices, kube-dns & coredns diverge on behavior
What happened?
Creating an endpoint slice and service in a cluster relying upon kube-dns results in no DNS resolution for either the service name or the hostnames under it. The endpoint slice is not seen by kube-dns.
This impacts Google Kubernetes Engine up to the current rapid version 1.22.3-gke.1500, including with the NodeLocal DNSCache and/or CloudDNS, all of which I suspect rely upon kube-dns. Other cloud providers relying upon kube-dns may be impacted, but I am not sure which those would be.
What did you expect to happen?
I expected that the Kubernetes DNS spec would recognize the discovery.k8s.io/v1 EndpointSlice resource as an alternative source for DNS records, given that the API has reached GA and used by kube-proxy, coredns, and a growing number of elements of the Kubernetes control plane. Introduced in 1.16, and GA in 1.21, kube-dns is not yet a consumer of endpoint slices and it appears to be difficult (?) to update the project to do so. See below.
How can we reproduce it (as minimally and precisely as possible)?
Apply this simple endpoint slice and service with kubectl apply -f
:
# endpointslice-test.yaml
apiVersion: discovery.k8s.io/v1
kind: EndpointSlice
metadata:
labels:
endpointslice.kubernetes.io/managed-by: friel
kubernetes.io/service-name: global
name: global-endpoint-slice-test
namespace: default
addressType: IPv4
endpoints:
- addresses:
- 10.0.0.1
conditions:
ready: true
serving: true
hostname: node-0
- addresses:
- 10.0.0.2
conditions:
ready: true
serving: true
hostname: node-1
ports: []
---
apiVersion: v1
kind: Service
metadata:
labels:
endpointslice.kubernetes.io/skip-mirror: "true"
name: global
namespace: default
spec:
clusterIP: None
clusterIPs:
- None
internalTrafficPolicy: Cluster
ports:
- name: http
port: 80
protocol: TCP
targetPort: 80
publishNotReadyAddresses: true
sessionAffinity: None
type: ClusterIP
status:
loadBalancer: {}
Launch a container such as alpine:latest
which contains an nslookup
or similar, via:
$ kubectl -n default run "test-$RANDOM" -it --rm --image alpine:latest
If you don't see a command prompt, try pressing enter.
/ # nslookup global.default.svc.cluster.local
Server: 10.124.0.10
Address: 10.124.0.10:53
** server can't find global.default.svc.cluster.local: NXDOMAIN
** server can't find global.default.svc.cluster.local: NXDOMAIN
/ # # likewise for node-0.global.default.svc.cluster.local
/ # # likewise for node-1.global.default.svc.cluster.local
Expected behavior:
In a cluster running coredns, endpointslices are discovered and resolved. Here’s the output of a cluster running coredns:
/ # nslookup global.default.svc.cluster.local
Server: 10.0.0.10
Address: 10.0.0.10:53
Name: global.default.svc.cluster.local
Address: 10.0.0.1
Name: global.default.svc.cluster.local
Address: 10.0.0.2
/ #
/ # nslookup node-0.global.default.svc.cluster.local
Server: 10.0.0.10
Address: 10.0.0.10:53
Name: node-0.global.default.svc.cluster.local
Address: 10.0.0.1
/ #
/ # nslookup node-1.global.default.svc.cluster.local
Server: 10.0.0.10
Address: 10.0.0.10:53
Name: node-1.global.default.svc.cluster.local
Address: 10.0.0.2
Anything else we need to know?
The kubernetes/dns repo seems to be in a state of disrepair, which gives me some concerns about supply chain security. Dependencies are pinned to versions 3 to 4 years old, which may or may not be a problem, but if someone were to discover a vulnerability in them it seems like there’s no agility to upgrade. One of the largest dependencies are has no active maintainer. There is an outstanding PR that is in limbo because there was no clear path to upgrading dependencies to use a more recent version of k8s.io/client-go.
Contents in this issue:
- Are copied from: kubernetes/dns#504
- Referencing: kubernetes/dns#505
Related issues and comments found:
- Most recent comments on #13358
- DNS service discovery spec here differs from the implementation promoted by Kubernetes project (coredns): https://github.com/kubernetes/dns/blob/master/docs/specification.md
Kubernetes version
n/a - up to 1.23.x. Issue is with kube-dns.
Cloud provider
Linode Kubernetes Engine, Digital Ocean, Azure Kubernetes Service: use coredns, test case succeeds.
OS version
n/a
Install tools
n/a
Container runtime (CRI) and and version (if applicable)
n/a
Related plugins (CNI, CSI, …) and versions (if applicable)
n/a
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 25 (14 by maintainers)
Let’s be very clear, there are two things:
The Spec in this repository is not connected to any implementation and should be updated to unambiguously describe the handling of EndpointSlice. I am also supportive of having a new home for the spec (probably in a KEP) that can be kept up to date. The reason it lives in the
kubernetes/dns
repo is somewhat historical. Bugs and updates should be made to the spec if we are seeing deficiencies. I don’t think it’s good to have no spec and depend on a given implementation for defining behavior.The kube-dns implementation (not the spec) probably should be deprecated although this will take work. We have to go through the full deprecation process.
@robscott I think that this is the opposite situation to the problem that solves the current endpointslicemirroring controller, new slices doesn’t mirror to legacy endpoints 😦