longhorn: [QUESTION] unable to mount NFS backup
Question
I have a strange behaviour with a NFS Backup Store. I have two different kubernetes clusters. From the one I can backup volumes and list the backups on my nsf storage. But on the other cluster backup failes and also when I try to list the backups from the Lonhorn UI I got the following error displayed:
error listing backups: error listing backup volumes: Failed to execute: /var/lib/longhorn/engine-binaries/longhornio-longhorn-engine-v1.1.0/longhorn [backup ls --volume-only nfs://foo.com:/var/backups/nfs], output Cannot create mount directory /var/lib/longhorn-backupstore-mounts/foo_com/var/backups/nfs for NFS server , stderr, time="2021-02-07T21:24:35Z" level=error msg="Cannot create mount directory /var/lib/longhorn-backupstore-mounts/foo_com/var/backups/nfs for NFS server" , error exit status 1
I tried it now for several hours but I can not figure out what is going wrong. I shut down the firewall, grant the directories with 777 but no success. I don’t have the slightest idea where I could look.
What does the message above tell me? On which server is the problem? On my nfs server or on my kubernetes node?
When I try to create a backup, then the following error message is shown:
fail to backup snapshot: failed to execute: /var/lib/longhorn/engine-binaries/longhornio-longhorn-engine-v1.1.0/longhorn [--url 10.244.4.137:10007 backup create --dest nfs://foo.com:/var/backups/nfs --label KubernetesStatus={"pvName":"pvc-360b2857-c0b1-4c61-b511-e948ff0258f3","pvStatus":"Bound","namespace":"myvolume-test","pvcName":"index","lastPVCRefAt":"","workloadsStatus":[{"podName":"imixs-office-workflow-7bffb6d598-v75lk","podStatus":"Running","workloadName":"imixs-office-workflow-7bffb6d598","workloadType":"ReplicaSet"}],"lastPodRefAt":""} 68430f1d-7de5-496a-ac34-ddaf10b3660b], output , stderr, time="2021-02-07T21:40:47Z" level=info msg="Backing up 68430f1d-7de5-496a-ac34-ddaf10b3660b on tcp://10.244.4.138:10150, to nfs://foo.com:/var/backups/nfs" time="2021-02-07T21:40:47Z" level=fatal msg="Error running create backup command: failed to create backup to nfs://foo.com:/var/backups/nfs for volume pvc-360b2857-c0b1-4c61-b511-e948ff0258f3: rpc error: code = Unknown desc = Cannot create mount directory /var/lib/longhorn-backupstore-mounts/foo_com/var/backups/nfs for NFS server" , error exit status 1
From the /etc/exports file on my NFS Server I added all the IP Adresses form my kubernetes nodes. All IPs are public IPs.
The very strange thing is, that even on the cluster on which I have the problems there are some volumes that can be backuped. But some other volumes can not.
On the cluster I have 5 nodes. Each volume has 3 replicas. It looks that volumes attached to some of my nodes having the problem, and other volumes not.
It all sounds pretty stupid, but I can’t describe it any better at the moment.
I am thankful for any kind of hint.
Environment:
- Longhorn version: 1.1.0
- Kubernetes version: 1.19.3
- Node config
- OS type and version debian
- Underlying Infrastructure (Baremetal):
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 18 (8 by maintainers)
Verified with Longhorn-master -
03/29/2021Validation - Pass
The error message is logged as below in case the
var/lib/longhorn-backupstore-mountsis deleted and a file with same name is created in the longhorn-manager pods.