kubernetes: Pods in StatefulSets cannot resolve each other by expected hostnames

Is this a request for help? (If yes, you should use our troubleshooting guide and community support channels, see http://kubernetes.io/docs/troubleshooting/.): Yes

What keywords did you search in Kubernetes issues before filing this one? (If you have found any duplicates, you should instead reply there.): StatefulSets, DNS (Found some issues, but all with resolutions I have tried)

Is this a BUG REPORT or FEATURE REQUEST? (choose one): BUG

Kubernetes version (use kubectl version):

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.2", GitCommit:"08e099554f3c31f6e6f07b448ab3ed78d0520507", GitTreeState:"clean", BuildDate:"2017-01-12T04:57:25Z", GoVersion:"go1.7.4", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.2", GitCommit:"477efc3cbe6a7effca06bd1452fa356e2201e1ee", GitTreeState:"clean", BuildDate:"2017-04-19T20:22:08Z", GoVersion:"go1.7.5", Compiler:"gc", Platform:"linux/amd64"}

Environment:

Cloud provider or hardware configuration:
OS (e.g. from /etc/os-release):

$ cat /etc/os-release
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

Kernel (e.g. uname -a):

$ uname -a
Linux motest-7 3.10.0-514.6.2.el7.x86_64 #1 SMP Thu Feb 23 03:04:39 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

Install tools:
Others:

What happened: Created StatefulSet with 2 replicas, and a headless service to allow access to them. The Pods have the expected hostname, but they are unable to resolve each others hostnames.

Here is the StatefulSet and Service definitions:

apiVersion: v1
kind: Service
metadata:
  name: oklog-store
  labels:
    app: oklog-store
spec:
  ports:
    - name: store-api
      port: 12050
      targetPort: 12050
      protocol: TCP
    - name: store-cluster
      port: 12059
      targetPort: 12059
      protocol: TCP
  clusterIP: None
  selector:
    app: oklog-store
---
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
  name: oklog-store
spec:
  serviceName: "oklog-store"
  replicas: 2
  template:
    metadata:
      labels:
        app: oklog-store
    spec:
      containers:
        - name: oklog-store
          image: docker.registry.lohs.geneity/dmiddlec/oklog-dmiddlec:test
          imagePullPolicy: Always
          env:
            - name: POD_IP
              valueFrom:
                fieldRef:
                  fieldPath: status.podIP
            - name: POD_NAMESPACE
              valueFrom:
                fieldRef:
                  fieldPath: metadata.namespace
          command:
            - "/bin/sh"
            - "-c"
            - |
              exec oklog store -debug -cluster tcp://$POD_IP:12059 \
                -api tcp://0.0.0.0:12050 \
                -peer oklog-ingest-0.oklog-ingest.$POD_NAMESPACE.svc.cluster.local:12059 \
                -peer oklog-ingest-1.oklog-ingest.$POD_NAMESPACE.svc.cluster.local:12059 \
                -peer oklog-ingest-2.oklog-ingest.$POD_NAMESPACE.svc.cluster.local:12059 \
                -peer oklog-ingest-3.oklog-ingest.$POD_NAMESPACE.svc.cluster.local:12059 \
                -peer oklog-store-0.oklog-store.$POD_NAMESPACE.svc.cluster.local:12059 \
                -peer oklog-store-1.oklog-store.$POD_NAMESPACE.svc.cluster.local:12059
          ports:
            - name: store-api
              containerPort: 12050
            - name: store-cluster
              containerPort: 12059
      terminationGracePeriodSeconds: 10
      nodeSelector:
        oklog: not-active

Note: clusterIP: None (this seems to be the resolution to this issue in all the other found cases).

Inside oklog-store-0

root@oklog-store-0:/# hostname -f
oklog-store-0.oklog-store.ci.svc.cluster.local
root@oklog-store-0:/# nslookup oklog-store.ci.svc.cluster.local
Server:         192.168.0.10
Address:        192.168.0.10#53

Name:   oklog-store.ci.svc.cluster.local
Address: 172.16.81.9
Name:   oklog-store.ci.svc.cluster.local
Address: 172.16.29.13
root@oklog-store-0:/# nslookup oklog-store-1.oklog-store.ci.svc.cluster.local
Server:         192.168.0.10
Address:        192.168.0.10#53

** server can't find oklog-store-1.oklog-store.ci.svc.cluster.local: NXDOMAIN

root@oklog-store-0:/# nslookup oklog-store-1
Server:         192.168.0.10
Address:        192.168.0.10#53

** server can't find oklog-store-1: SERVFAIL
root@oklog-store-0:/# nslookup -q=SRV oklog-store.ci.svc.cluster.local
Server:         192.168.0.10
Address:        192.168.0.10#53

oklog-store.ci.svc.cluster.local        service = 10 50 0 12c78c01.oklog-store.ci.svc.cluster.local.
oklog-store.ci.svc.cluster.local        service = 10 50 0 4749ebd6.oklog-store.ci.svc.cluster.local.
root@oklog-store-0:/# nslookup 12c78c01.oklog-store.ci.svc.cluster.local
Server:         192.168.0.10
Address:        192.168.0.10#53

Name:   12c78c01.oklog-store.ci.svc.cluster.local
Address: 172.16.29.13

root@oklog-store-0:/# nslookup 4749ebd6.oklog-store.ci.svc.cluster.local
Server:         192.168.0.10
Address:        192.168.0.10#53

Name:   4749ebd6.oklog-store.ci.svc.cluster.local
Address: 172.16.81.9

Note: nslookup 'domain' returns both records as expected. Note: The pods are available, just with unexpected hostnames. What you expected to happen:

nslookup oklog-store-1.oklog-store.ci.svc.cluster.local to resolve to the correct IP address.

How to reproduce it (as minimally and precisely as possible): Use the given versions, and use the given .yaml to load a StatefulSet and a ‘headless’ Service.

Anything else we need to know:

About this issue

Original URL
State: closed
Created 7 years ago
Comments: 36 (2 by maintainers)

Commits related to this issue

Updating cassandra-service configuration for kube cluster — committed to pranav-patil/spring-kubernetes-microservices by pranav-patil 5 years ago

Most upvoted comments

SOLVED: The spec.serviceName field in the statefulSet manifest was wrong. It should match the metadata.name field in the headless service definition.

For anyone interested on debugging kube-dns; https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/

+73

isaldarriaga on Jun 16, 2017

for me redis-cluster-1.redis-cluster works, but not redis-cluster-1

+10

matti on Feb 20, 2019

@thedodd @kuznero @alahijani

I had the exact same issue with clusterIP: None set. Turned out my issue was that I had my service selector wrong i.e

apiVersion: v1
kind: Service
metadata:
  name: nginx
  labels:
    app: nginx
spec:
  ports:
  - port: 80
    name: web
  clusterIP: None
  selector:
    name: nginx

should have been

apiVersion: v1
kind: Service
metadata:
  name: nginx
  labels:
    app: nginx
spec:
  ports:
  - port: 80
    name: web
  clusterIP: None
  selector:
    app: nginx

Might want to check that.

geigev on Mar 11, 2018

Solved

Turns out I had the FQDN pattern incorrect. (Not only can you get the Service, and the Service Selector wrong, but you can also do this.)

I was using: hidden-secondary-mongo-hidden-secondary-0.hidden-secondary.svc.cluster.local

Which is the pattern: <pod>-<N>.<namespace>.svc.cluster.local

When I should have used:

hidden-secondary-mongo-hidden-secondary-0.hidden-secondary-mongo-hidden-secondary.hidden-secondary.svc.cluster.local

Which is the pattern:

<pod>-<N>.<statefulSetName>.<namespace>.svc.cluster.local

The table found here can be tremendously helpful.

Suffice to say, some names will become shorter.

Freyert on Dec 5, 2018

I’m having similar issues with statefulset and headless service with clusterip: none, on GKE 1.10.5-gke.3. I can get the pods to resolve on pod-0.service.namespace.svc.cluster.local, but they will not listen to pod-0, only if it’s inside pod-0. So i can’t use the short hostnames execpt inside them selves.

jonasdkhansen on Aug 9, 2018

You can resolve this issue specifying in your configuration a dnsConfig with a search entry: rabbit-service.your_namespace.svc.cluster.local

jerem991 on Feb 25, 2020

@thedodd @kuznero

The service has to be headless (clusterIP: None) for the pod DNS entries to work. Check https://github.com/kubernetes/kubernetes/issues/39197

alahijani on Feb 28, 2018

Hi, the environment variable HOSTNAME is the pod name e.g. nifi-0

makes it impossible to resolve similarly pods of the same statefulset with just the HOSTNAME e.g. nifi-1 can’t access nifi-0 pod by it’s HOSTNAME nifi-1

It would be more helpful to make the HOSTNAME nifi-0.nifi-headless and nifi-1.nifi-headless

where nifi-headless is their common headless service name

whs-dot-hk on Mar 21, 2019

What version of kubernetes have fixed it? I used 1.11.5 And it works correct when I set redis-cluster-1.redis-cluster.default.svc.cluster.local with redis-cluster-1 it doesn’t work

AlexProfi on Jan 30, 2019

Solved

Turns out I had the FQDN pattern incorrect. (Not only can you get the Service, and the Service Selector wrong, but you can also do this.)

I was using: hidden-secondary-mongo-hidden-secondary-0.hidden-secondary.svc.cluster.local

Which is the pattern: <pod>-<N>.<namespace>.svc.cluster.local

When I should have used:

hidden-secondary-mongo-hidden-secondary-0.hidden-secondary-mongo-hidden-secondary.hidden-secondary.svc.cluster.local

Which is the pattern:

<pod>-<N>.<statefulSetName>.<namespace>.svc.cluster.local

The table found here can be tremendously helpful.

Suffice to say, some names will become shorter.

The official example in this doc just not work

hustshawn on Jan 29, 2019

@MPV The metadata.name of the headless service also needs to match the spec.serviceName of the StatefulSet.

edit: apparently it does, based on the template I just read.

jnicholls on Sep 14, 2018

Just FYI an upgrade from 1.10.1 to 1.10.6 did not help.

jnicholls on Aug 9, 2018