minikube: nfs: Failed to resolve server nfs-server.default.svc.cluster.local: Name or service not known

BUG REPORT

Environment:

Minikube version: v0.30.0

  • OS: Fedora 29
  • VM Driver: virtualbox, kvm2
  • ISO version: v0.30.0
  • Others:
    • kubernetes version: tested on v1.10.0, v1.13.0
    • tested with coredns and kube-dns minikube addons

What happened: NFS volume fails to mount due to DNS error (Failed to resolve server nfs-server.default.svc.cluster.local: Name or service not known). This problem does not occur when deployed on GKE.

What you expected to happen: NFS volume is mounted without an error.

How to reproduce it (as minimally and precisely as possible):

  1. Start nfs-server:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: nfs-server
spec:
  replicas: 1
  selector:
    matchLabels:
      role: nfs-server
  template:
    metadata:
      labels:
        role: nfs-server
    spec:
      containers:
      - name: nfs-server
        image: gcr.io/google_containers/volume-nfs:0.8
        ports:
        - name: nfs
          containerPort: 2049
        - name: mountd
          containerPort: 20048
        - name: rpcbind
          containerPort: 111
        securityContext:
          privileged: true
        volumeMounts:
        - mountPath: /exports
          name: exports
      volumes:
      - name: exports
        emptyDir: {}
---
apiVersion: v1
kind: Service
metadata:
  name: nfs-server
spec:
  ports:
  - name: nfs
    port: 2049
  - name: mountd
    port: 20048
  - name: rpcbind
    port: 111
  selector:
    role: nfs-server
  1. Start service consuming the nfs volume (e.g. busybox):
apiVersion: v1
kind: ReplicationController
metadata:
  name: nfs-busybox
spec:
  replicas: 1
  selector:
    name: nfs-busybox
  template:
    metadata:
      labels:
        name: nfs-busybox
    spec:
      containers:
      - image: busybox
        command:
          - sh
          - -c
          - 'while true; do date > /mnt/index.html; hostname >> /mnt/index.html; sleep $(($RANDOM % 5 + 5)); done'
        imagePullPolicy: IfNotPresent
        name: busybox
        volumeMounts:
          - name: nfs
            mountPath: "/mnt"
      volumes:
      - name: nfs
        nfs:
          server: nfs-server.default.svc.cluster.local
          path: "/"

Output of minikube logs (if applicable): In kubectl describe pod nfs-busybox-... is this error:

  Warning  FailedMount  4m    kubelet, minikube  MountVolume.SetUp failed for volume "nfs" : mount failed: exit status 32
Mounting command: systemd-run
Mounting arguments: --description=Kubernetes transient mount for /var/lib/kubelet/pods/ab2e9ad4-f88b-11e8-8a56-4004c9e1505b/volumes/kubernetes.io~nfs/nfs --scope -- mount -t nfs nfs-server.default.svc.cluster.local:/ /var/lib/kubelet/pods/ab2e9ad4-f88b-11e8-8a56-4004c9e1505b/volumes/kubernetes.io~nfs/nfs
Output: Running scope as unit: run-r23cae2998bf349df8046ac3c61bfe4e9.scope
mount.nfs: Failed to resolve server nfs-server.default.svc.cluster.local: Name or service not known

Which indicates problem with DNS resolution for nfs-server.default.svc.cluster.local.

Note: The NFS is mounted successfully when specified by ClusterIP instead of domain name.

Anything else do we need to know: The same problem was reported already for previous version #2218, but it is closed due to inactivity of the author and no-one seems to really looked into it. There is a workaround for this, but it is required to do it every time a minikube VM is created.

When running kubectl exec -ti nfs-busybox-... -- nslookup nfs-server.default.svc.cluster.local:

Server:         10.96.0.10
Address:        10.96.0.10:53

Name:   nfs-server.default.svc.cluster.local
Address: 10.105.22.251

*** Can't find nfs-server.default.svc.cluster.local: No answer

Where strangely the service ClusterIP is present (when using kube-dns the service ClusterIP part is missing completely).

About this issue

  • Original URL
  • State: open
  • Created 6 years ago
  • Reactions: 18
  • Comments: 32 (3 by maintainers)

Commits related to this issue

Most upvoted comments

@willzhang If you are using NFS CSI driver v4.1.0 or v4.0.0, try changing the dnsPolicy of csi-nfs-controller and csi-nfs-node to ClusterFirstWithHostNet, it works for me.

I was able to solve this problem by creating a service with a static clusterIP and then mounting to the IP instead of service name. No DNS required. This is working nicely on Azure. I haven’t tried elsewhere

In my case, I’m using an HDFS NFS Gateway and chose 10.0.200.2 for the clusterIP

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: hdfs
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: WaitForFirstConsumer
---
apiVersion: v1
kind: Service
metadata:
  name: hdfs-nfs
  labels:
    component: hdfs-nn
spec:
  type: ClusterIP
  clusterIP: 10.0.200.2
  ports:
    - name: portmapper
      port: 111
      protocol: TCP
    - name: nfs
      port: 2049
      protocol: TCP
    - name: mountd
      port: 4242
      protocol: TCP
  selector:
    component: hdfs-nn
---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: hdfs
spec:
  storageClassName: hdfs
  capacity:
    storage: 3000Gi
  accessModes:
    - ReadWriteMany
  mountOptions:
    - vers=3
    - proto=tcp
    - nolock
    - noacl
    - sync    
  nfs:
    server: 10.0.200.2
    path: "/"
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: hdfs
spec:
  storageClassName: hdfs
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 3000Gi

@tamalsaha Yes, I have seen it, but there has been posted only a workaround for the issue, not an actual fix.

The problem is that the components responsible for NFS storage backends do not use the cluster internal DNS but try to resolve the NFS server with the DNS information given on the worker node itself. One way to make this work would be to do a hosts-file entry on the worker nodes using (nfs-server.default.svc.cluster.local) and the nfs-server’s ip address. But this is just a quick and dirty hack-around.

But it’s just odd that this component is not able to use the cluster internal DNS resolution. This would make much more sense and be more intuitive to use.

Apologies, I’m not a Minikube user but this is the most apt issue I’ve found for the problems that I’m having.

I’m experiencing these exact problems:

  • NFS-mounting by the internal domain (nfs-server.default.svc.cluster.local) doesn’t work during ContainerCreating phase
  • Using the service IP does work.
  • Setting up a busybox pod, and using nslookup in there resolves the domain just fine.

Based on my googling efforts so far, this seems to be a Kubernetes issue where the NFS is being set up before the container can reach coredns. Perhaps an initialization order problem?

same when use csi-driver-nfs

https://github.com/kubernetes-csi/csi-driver-nfs/blob/master/deploy/example/nfs-provisioner/README.md

root@ubuntu:/data/kubevirt# kubectl describe pods nginx-nfs-example
Name:         nginx-nfs-example
Namespace:    default
Priority:     0
Node:         node1/192.168.72.31
Start Time:   Fri, 20 May 2022 18:01:08 +0800
Labels:       <none>
Annotations:  <none>
Status:       Pending
IP:           
IPs:          <none>
Containers:
  nginx:
    Container ID:   
    Image:          nginx
    Image ID:       
    Port:           80/TCP
    Host Port:      0/TCP
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-v7h85 (ro)
      /var/www from pvc-nginx (rw)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  pvc-nginx:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  pvc-nginx
    ReadOnly:   false
  kube-api-access-v7h85:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason            Age                  From               Message
  ----     ------            ----                 ----               -------
  Warning  FailedScheduling  12m (x4 over 12m)    default-scheduler  0/3 nodes are available: 3 pod has unbound immediate PersistentVolumeClaims. preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling.
  Normal   Scheduled         12m                  default-scheduler  Successfully assigned default/nginx-nfs-example to node1
  Warning  FailedMount       5m34s                kubelet            Unable to attach or mount volumes: unmounted volumes=[pvc-nginx], unattached volumes=[kube-api-access-v7h85 pvc-nginx]: timed out waiting for the condition
  Warning  FailedMount       110s (x13 over 12m)  kubelet            MountVolume.SetUp failed for volume "pv-nginx" : rpc error: code = Internal desc = mount failed: exit status 32
Mounting command: mount
Mounting arguments: -t nfs -o nfsvers=4.1 nfs-server.default.svc.cluster.local:/ /var/lib/kubelet/pods/d534e8dc-6364-40c1-989e-4448d5e6ae3c/volumes/kubernetes.io~csi/pv-nginx/mount
Output: mount.nfs: Failed to resolve server nfs-server.default.svc.cluster.local: Name or service not known
  Warning  FailedMount  62s (x4 over 10m)  kubelet  Unable to attach or mount volumes: unmounted volumes=[pvc-nginx], unattached volumes=[pvc-nginx kube-api-access-v7h85]: timed out waiting for the condition

For anyone else running into this in general (not only with minikube), I’ve made a small image+daemonset that basically does the later option mentionned above (daemonset updating host’s /etc/systemd/resolved.conf)

~~Should work in most scenarios where the cloud provider isn’t doing something too too funky with their DNS config https://github.com/Tristan971/kube-enable-coredns-on-node~~

(bit dirty/ad-hoc in its current state, but could be made to support more hosts setups)

EDIT: Brian’s solution, right below, is the best current solution.

well, I’m running into the same issue on EKS as well. By defining the nfs server IP directly, it just works. Is it a known issue on EKS as well? or probably should I go to EFS on AWS? 😦

For anyone else finding themselves in the same situation, who can’t use the ClusterIP service, I was also able to get it to work using the NFS CSI Driver like @fosmjo mentioned above. Apparently v4.4.0 defaults to the necessary dnsPolicy as well, so no need for configuration beyond their default helm chart. Figured I’d drop a full example for copy pasta.

Installed the helm chart from their repo:

helm repo add csi-driver-nfs https://raw.githubusercontent.com/kubernetes-csi/csi-driver-nfs/master/charts
helm install csi-driver-nfs csi-driver-nfs/csi-driver-nfs --namespace kube-system --version v4.4.0

I’m running NFS inside my cluster using the gp2 StorageClass to create an EBS-backed volume for my deployment, here’s my template:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nfs-server
  namespace: storage
spec:
  replicas: 1
  selector:
    matchLabels:
      role: nfs-server
  template:
    metadata:
      labels:
        role: nfs-server
    spec:
      containers:
      - name: nfs-server
        image: itsthenetwork/nfs-server-alpine:latest
        ports:
          - name: nfs
            containerPort: 2049
        securityContext:
          privileged: true
        volumeMounts:
          - mountPath: /nfs
            name: nfs-volume
        env:
          - name: SHARED_DIRECTORY
            value: /nfs
      volumes:
        - name: nfs-volume
          persistentVolumeClaim:
            claimName: nfs-pvc
---
apiVersion: v1
kind: Service
metadata:
  name: nfs-service
  namespace: storage
spec:
  ports:
    - name: nfs
      port: 2049
  selector:
    role: nfs-server

---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: nfs-pvc
  namespace: storage
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: gp2
  resources:
    requests:
      storage: 2Gi

Lastly, create the StorageClass, PVC, and Deployment that will mount your NFS share:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
spec:
  replicas: 2
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.14.2
        ports:
        - containerPort: 80
        volumeMounts:
        - name: nfs
          mountPath: /usr/share/nginx/html
      volumes:
        - name: nfs
          persistentVolumeClaim:
            claimName: nfs-pvc-nginx
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: nfs-csi
provisioner: nfs.csi.k8s.io
parameters:
  server: nfs-service.storage.svc.cluster.local
  share: /

---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: nfs-pvc-nginx
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: nfs-csi
  resources:
    requests:
      storage: 1Gi 

From what I can tell, the only solution to this would be to have the k8s node have access to k8s’s coredns, which is responsible for resolving these names. However in my experience most k8s nodes use their own dns independent of k8s.