kubernetes: Pods in StatefulSets cannot resolve each other by expected hostnames
Is this a request for help? (If yes, you should use our troubleshooting guide and community support channels, see http://kubernetes.io/docs/troubleshooting/.): Yes
What keywords did you search in Kubernetes issues before filing this one? (If you have found any duplicates, you should instead reply there.): StatefulSets, DNS (Found some issues, but all with resolutions I have tried)
Is this a BUG REPORT or FEATURE REQUEST? (choose one): BUG
Kubernetes version (use kubectl version):
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.2", GitCommit:"08e099554f3c31f6e6f07b448ab3ed78d0520507", GitTreeState:"clean", BuildDate:"2017-01-12T04:57:25Z", GoVersion:"go1.7.4", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.2", GitCommit:"477efc3cbe6a7effca06bd1452fa356e2201e1ee", GitTreeState:"clean", BuildDate:"2017-04-19T20:22:08Z", GoVersion:"go1.7.5", Compiler:"gc", Platform:"linux/amd64"}
Environment:
- Cloud provider or hardware configuration:
- OS (e.g. from /etc/os-release):
$ cat /etc/os-release
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"
CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"
- Kernel (e.g.
uname -a):
$ uname -a
Linux motest-7 3.10.0-514.6.2.el7.x86_64 #1 SMP Thu Feb 23 03:04:39 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
- Install tools:
- Others:
What happened: Created StatefulSet with 2 replicas, and a headless service to allow access to them. The Pods have the expected hostname, but they are unable to resolve each others hostnames.
Here is the StatefulSet and Service definitions:
apiVersion: v1
kind: Service
metadata:
name: oklog-store
labels:
app: oklog-store
spec:
ports:
- name: store-api
port: 12050
targetPort: 12050
protocol: TCP
- name: store-cluster
port: 12059
targetPort: 12059
protocol: TCP
clusterIP: None
selector:
app: oklog-store
---
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
name: oklog-store
spec:
serviceName: "oklog-store"
replicas: 2
template:
metadata:
labels:
app: oklog-store
spec:
containers:
- name: oklog-store
image: docker.registry.lohs.geneity/dmiddlec/oklog-dmiddlec:test
imagePullPolicy: Always
env:
- name: POD_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
command:
- "/bin/sh"
- "-c"
- |
exec oklog store -debug -cluster tcp://$POD_IP:12059 \
-api tcp://0.0.0.0:12050 \
-peer oklog-ingest-0.oklog-ingest.$POD_NAMESPACE.svc.cluster.local:12059 \
-peer oklog-ingest-1.oklog-ingest.$POD_NAMESPACE.svc.cluster.local:12059 \
-peer oklog-ingest-2.oklog-ingest.$POD_NAMESPACE.svc.cluster.local:12059 \
-peer oklog-ingest-3.oklog-ingest.$POD_NAMESPACE.svc.cluster.local:12059 \
-peer oklog-store-0.oklog-store.$POD_NAMESPACE.svc.cluster.local:12059 \
-peer oklog-store-1.oklog-store.$POD_NAMESPACE.svc.cluster.local:12059
ports:
- name: store-api
containerPort: 12050
- name: store-cluster
containerPort: 12059
terminationGracePeriodSeconds: 10
nodeSelector:
oklog: not-active
Note: clusterIP: None (this seems to be the resolution to this issue in all the other found cases).
Inside oklog-store-0
root@oklog-store-0:/# hostname -f
oklog-store-0.oklog-store.ci.svc.cluster.local
root@oklog-store-0:/# nslookup oklog-store.ci.svc.cluster.local
Server: 192.168.0.10
Address: 192.168.0.10#53
Name: oklog-store.ci.svc.cluster.local
Address: 172.16.81.9
Name: oklog-store.ci.svc.cluster.local
Address: 172.16.29.13
root@oklog-store-0:/# nslookup oklog-store-1.oklog-store.ci.svc.cluster.local
Server: 192.168.0.10
Address: 192.168.0.10#53
** server can't find oklog-store-1.oklog-store.ci.svc.cluster.local: NXDOMAIN
root@oklog-store-0:/# nslookup oklog-store-1
Server: 192.168.0.10
Address: 192.168.0.10#53
** server can't find oklog-store-1: SERVFAIL
root@oklog-store-0:/# nslookup -q=SRV oklog-store.ci.svc.cluster.local
Server: 192.168.0.10
Address: 192.168.0.10#53
oklog-store.ci.svc.cluster.local service = 10 50 0 12c78c01.oklog-store.ci.svc.cluster.local.
oklog-store.ci.svc.cluster.local service = 10 50 0 4749ebd6.oklog-store.ci.svc.cluster.local.
root@oklog-store-0:/# nslookup 12c78c01.oklog-store.ci.svc.cluster.local
Server: 192.168.0.10
Address: 192.168.0.10#53
Name: 12c78c01.oklog-store.ci.svc.cluster.local
Address: 172.16.29.13
root@oklog-store-0:/# nslookup 4749ebd6.oklog-store.ci.svc.cluster.local
Server: 192.168.0.10
Address: 192.168.0.10#53
Name: 4749ebd6.oklog-store.ci.svc.cluster.local
Address: 172.16.81.9
Note: nslookup 'domain' returns both records as expected.
Note: The pods are available, just with unexpected hostnames.
What you expected to happen:
nslookup oklog-store-1.oklog-store.ci.svc.cluster.local to resolve to the correct IP address.
How to reproduce it (as minimally and precisely as possible):
Use the given versions, and use the given .yaml to load a StatefulSet and a ‘headless’ Service.
Anything else we need to know:
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Comments: 36 (2 by maintainers)
Commits related to this issue
- Updating cassandra-service configuration for kube cluster — committed to pranav-patil/spring-kubernetes-microservices by pranav-patil 5 years ago
SOLVED: The spec.serviceName field in the statefulSet manifest was wrong. It should match the metadata.name field in the headless service definition.
For anyone interested on debugging kube-dns; https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/
for me
redis-cluster-1.redis-clusterworks, but notredis-cluster-1@thedodd @kuznero @alahijani
I had the exact same issue with
clusterIP: Noneset. Turned out my issue was that I had my service selector wrong i.eshould have been
Might want to check that.
Solved
Turns out I had the FQDN pattern incorrect. (Not only can you get the Service, and the Service Selector wrong, but you can also do this.)
I was using:
hidden-secondary-mongo-hidden-secondary-0.hidden-secondary.svc.cluster.localWhich is the pattern:
<pod>-<N>.<namespace>.svc.cluster.localWhen I should have used:
hidden-secondary-mongo-hidden-secondary-0.hidden-secondary-mongo-hidden-secondary.hidden-secondary.svc.cluster.localWhich is the pattern:
<pod>-<N>.<statefulSetName>.<namespace>.svc.cluster.localThe table found here can be tremendously helpful.
Suffice to say, some names will become shorter.
I’m having similar issues with statefulset and headless service with clusterip: none, on GKE 1.10.5-gke.3. I can get the pods to resolve on pod-0.service.namespace.svc.cluster.local, but they will not listen to pod-0, only if it’s inside pod-0. So i can’t use the short hostnames execpt inside them selves.
You can resolve this issue specifying in your configuration a dnsConfig with a search entry: rabbit-service.your_namespace.svc.cluster.local
@thedodd @kuznero
The service has to be headless (
clusterIP: None) for the pod DNS entries to work. Check https://github.com/kubernetes/kubernetes/issues/39197Hi, the environment variable HOSTNAME is the pod name e.g. nifi-0
makes it impossible to resolve similarly pods of the same statefulset with just the HOSTNAME e.g. nifi-1 can’t access nifi-0 pod by it’s HOSTNAME nifi-1
It would be more helpful to make the HOSTNAME nifi-0.nifi-headless and nifi-1.nifi-headless
where nifi-headless is their common headless service name
What version of kubernetes have fixed it? I used 1.11.5 And it works correct when I set redis-cluster-1.redis-cluster.default.svc.cluster.local with redis-cluster-1 it doesn’t work
The official example in this doc just not work
@MPV The
metadata.nameof the headless service also needs to match thespec.serviceNameof the StatefulSet.edit: apparently it does, based on the template I just read.
Just FYI an upgrade from 1.10.1 to 1.10.6 did not help.