longhorn: [QUESTION] unable to mount NFS backup

Question

I have a strange behaviour with a NFS Backup Store. I have two different kubernetes clusters. From the one I can backup volumes and list the backups on my nsf storage. But on the other cluster backup failes and also when I try to list the backups from the Lonhorn UI I got the following error displayed:

error listing backups: error listing backup volumes: Failed to execute: /var/lib/longhorn/engine-binaries/longhornio-longhorn-engine-v1.1.0/longhorn [backup ls --volume-only nfs://foo.com:/var/backups/nfs], output Cannot create mount directory /var/lib/longhorn-backupstore-mounts/foo_com/var/backups/nfs for NFS server , stderr, time="2021-02-07T21:24:35Z" level=error msg="Cannot create mount directory /var/lib/longhorn-backupstore-mounts/foo_com/var/backups/nfs for NFS server" , error exit status 1

I tried it now for several hours but I can not figure out what is going wrong. I shut down the firewall, grant the directories with 777 but no success. I don’t have the slightest idea where I could look.

What does the message above tell me? On which server is the problem? On my nfs server or on my kubernetes node?

When I try to create a backup, then the following error message is shown:

fail to backup snapshot: failed to execute: /var/lib/longhorn/engine-binaries/longhornio-longhorn-engine-v1.1.0/longhorn [--url 10.244.4.137:10007 backup create --dest nfs://foo.com:/var/backups/nfs --label KubernetesStatus={"pvName":"pvc-360b2857-c0b1-4c61-b511-e948ff0258f3","pvStatus":"Bound","namespace":"myvolume-test","pvcName":"index","lastPVCRefAt":"","workloadsStatus":[{"podName":"imixs-office-workflow-7bffb6d598-v75lk","podStatus":"Running","workloadName":"imixs-office-workflow-7bffb6d598","workloadType":"ReplicaSet"}],"lastPodRefAt":""} 68430f1d-7de5-496a-ac34-ddaf10b3660b], output , stderr, time="2021-02-07T21:40:47Z" level=info msg="Backing up 68430f1d-7de5-496a-ac34-ddaf10b3660b on tcp://10.244.4.138:10150, to nfs://foo.com:/var/backups/nfs" time="2021-02-07T21:40:47Z" level=fatal msg="Error running create backup command: failed to create backup to nfs://foo.com:/var/backups/nfs for volume pvc-360b2857-c0b1-4c61-b511-e948ff0258f3: rpc error: code = Unknown desc = Cannot create mount directory /var/lib/longhorn-backupstore-mounts/foo_com/var/backups/nfs for NFS server" , error exit status 1

From the /etc/exports file on my NFS Server I added all the IP Adresses form my kubernetes nodes. All IPs are public IPs.

The very strange thing is, that even on the cluster on which I have the problems there are some volumes that can be backuped. But some other volumes can not.

On the cluster I have 5 nodes. Each volume has 3 replicas. It looks that volumes attached to some of my nodes having the problem, and other volumes not.

It all sounds pretty stupid, but I can’t describe it any better at the moment.

I am thankful for any kind of hint.

Environment:

Longhorn version: 1.1.0
Kubernetes version: 1.19.3
Node config
- OS type and version debian
Underlying Infrastructure (Baremetal):

About this issue

Original URL
State: closed
Created 3 years ago
Comments: 18 (8 by maintainers)

Most upvoted comments

Verified with Longhorn-master - 03/29/2021

Validation - Pass

The error message is logged as below in case the var/lib/longhorn-backupstore-mounts is deleted and a file with same name is created in the longhorn-manager pods.

[longhorn-manager-bjnf7] time="2021-03-23T06:34:33Z" level=error msg="Error in request: error listing backups: error listing backup volumes: Failed to execute: /var/lib/longhorn/engine-binaries/longhornio-longhorn-engine-master/longhorn [backup ls --volume-only nfs://3.138.138.33:/nfs], output Cannot create mount directory /var/lib/longhorn-backupstore-mounts/3_138_138_33/nfs for NFS server: mkdir /var/lib/longhorn-backupstore-mounts: not a directory\n, stderr, time=\"2021-03-23T06:34:33Z\" level=error msg=\"Cannot create mount directory /var/lib/longhorn-backupstore-mounts/3_138_138_33/nfs for NFS server: mkdir /var/lib/longhorn-backupstore-mounts: not a directory\"\n, error exit status 1"

khushboo-rancher on Mar 23, 2021