kubernetes: Pods created via StatefulSet are not being given A records as expected

Hey, As the title says, I’m unable to resolve pods by their hostname when they’re being created as part of a StatefulSet.

Environment:

  • Google Container Engine 1.9.6-gke.1 (I have tried 1.8.8 too)
  • Kubedns gcr.io/google_containers/k8s-dns-kube-dns-amd64:1.14.8

Working with pods explicitly This is an example proving that DNS is working as I expect it to, as per this guide

apiVersion: v1
kind: Service
metadata:
  name: default-subdomain
spec:
  selector:
    name: busybox
  clusterIP: None
  ports:
  - name: foo # Actually, no port is needed.
    port: 1234
    targetPort: 1234
---
apiVersion: v1
kind: Pod
metadata:
  name: busybox-0
  labels:
    name: busybox
spec:
  hostname: busybox-0
  subdomain: default-subdomain
  containers:
  - image: busybox
    command:
      - sleep
      - "3600"
    name: busybox
---
apiVersion: v1
kind: Pod
metadata:
  name: busybox-1
  labels:
    name: busybox
spec:
  hostname: busybox-1
  subdomain: default-subdomain
  containers:
  - image: busybox
    command:
      - sleep
      - "3600"
    name: busybox

As you can see, I can go onto busybox-0 and hit busybox-1:

❯ k exec -it busybox-0 /bin/sh
/ # echo $HOSTNAME
busybox-0

/ # ping busybox-1.default-subdomain.default.svc.cluster.local
PING busybox-1.default-subdomain.default.svc.cluster.local (10.202.1.25): 56 data bytes
64 bytes from 10.202.1.25: seq=0 ttl=62 time=1.627 ms
64 bytes from 10.202.1.25: seq=1 ttl=62 time=1.447 ms

Not working with StatefulSet Modifying the above example to use a StatefulSet to create the pods instead, then they’re not able to resolve each other (only themselves).

apiVersion: v1
kind: Service
metadata:
  name: default-subdomain
spec:
  selector:
    name: busybox
  clusterIP: None
  ports:
  - name: foo # Actually, no port is needed.
    port: 1234
    targetPort: 1234
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: busybox
spec:
  replicas: 2
  selector:
    matchLabels:
      app: busybox
  serviceName: default-subdomain
  template:
    metadata:
      labels:
        app: busybox
    spec:
      containers:
      - image: busybox
        command:
          - sleep
          - "3600"
        name: busybox
❯ k exec -it busybox-0 /bin/sh
/ # echo $HOSTNAME
busybox-0

/ # ping busybox-1.default-subdomain.default.svc.cluster.local
ping: bad address 'busybox-1.default-subdomain.default.svc.cluster.local'

However as you can see, the pod is able to resolve itself on the FQDN:

/ # ping busybox-0.default-subdomain.default.svc.cluster.local
PING busybox-0.default-subdomain.default.svc.cluster.local (10.202.1.26): 56 data bytes
64 bytes from 10.202.1.26: seq=0 ttl=64 time=0.062 ms
64 bytes from 10.202.1.26: seq=1 ttl=64 time=0.049 ms

I can’t see anything the kube-dns logs that looks suspicious!

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 15 (6 by maintainers)

Most upvoted comments

Should we close this then?

Please don’t. I have the exact issue. Kubernetes 1.10.3

@Stono Your StatefulSet example doesn’t work as expected because your service selector is name: busybox, but the labels in the pod spec template are app: busybox.

The following example works for me.

---
apiVersion: v1
kind: Service
metadata:
  name: headless
spec:
  selector:
    app: test
  clusterIP: None
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: test
spec:
  replicas: 2
  selector:
    matchLabels:
      app: test
  serviceName: headless
  template:
    metadata:
      labels:
        app: test
    spec:
      containers:
      - name: test
        image: alpine
        command:
        - sleep
        - "3600"

Pods can resolve other pods through test-<index>.headless.default.svc.cluster.local. I tested this out on GKE v1.12.5-gke.10. Note the headless service doesn’t specify any ports. The dummy port issue was fixed in https://github.com/kubernetes/kubernetes/pull/67622, and it’s in the 1.12 release. For anything earlier, you’ll need to specify a dummy port.

We originally found this issue because we were incorrectly trying to specify the subdomain in the pod spec template. Turns out the pod’s subdomain value is set from the StatefulSet’s serviceName! See pkg/controller/statefulset/stateful_set_utils.go#L188. The DNS documentation could probably have a section for StatefulSets explaining how that works. I’ll try and submit something for that.

Thanks!