longhorn: [BUG] Backup NFS - Operation not permitted during mount
Describe the bug (🐛 if you encounter this issue)
Setting backupTarget to an NFS store is giving an Operation not permitted error in the UI
I am using a Synology NAS that only supports up to v4.1
# cat /proc/fs/nfsd/versions
+2 +3 +4 +4.1
ash-4.4# cat /etc/exports
/volume2/k8s-backup 10.0.80.0/21(rw,async,no_wdelay,no_root_squash,insecure_locks,sec=sys,anonuid=1025,anongid=100)
My kubernetes nodes are within the subnet above
To Reproduce
Manually exec the mount command from within a longhorn-manager:
$ mkdir -p /mnt/nfs
$ mount -t nfs4 -o nfsvers=4.1,actimeo=1,soft,timeo=300,retry=2 <nas url>:/volume2/k8s-backup /mnt/nfs
mount.nfs4: Operation not permitted
Executing the same command directly on a k8s node works fine:
$ sudo mount -t nfs4 -o nfsvers=4.1,actimeo=1,soft,timeo=300,retry=2 <nas url>:/volume2/k8s-backup /tmp/nas
$
Expected behavior
NFS should mount
Log or Support bundle
If applicable, add the Longhorn managers’ log or support bundle when the issue happens. You can generate a Support Bundle using the link at the footer of the Longhorn UI.
Environment
- Longhorn version: 1.4.2
- Installation method (e.g. Rancher Catalog App/Helm/Kubectl): Helm / Flux
- Kubernetes distro (e.g. RKE/K3s/EKS/OpenShift) and version: K3s
- Number of management node in the cluster: 1
- Number of worker node in the cluster: 4
- Node config
- OS type and version: Ubuntu 22.04
- CPU per node: 2
- Memory per node: 30GB
- Disk type(e.g. SSD/NVMe): NVMe
- Network bandwidth between the nodes: 1Gbe
- Underlying Infrastructure (e.g. on AWS/GCE, EKS/GKE, VMWare/KVM, Baremetal): Proxmox
- Number of Longhorn volumes in the cluster: N/A
Additional context
Add any other context about the problem here.
About this issue
- Original URL
- State: open
- Created a year ago
- Reactions: 2
- Comments: 15 (6 by maintainers)
Hello @dotdiego yes of course, so basically your NFS server expecting your client to have a source port to be in the privileged port range, in simple term it must be under 1024.
So you must somehow find a way to make sure your client (in this case, the longhorn pod) is making a nfs query with a source port between 1-1023.
In my case, my kubernetes cluster need to go through my gateway server to reach the NFS, good news for me it’s fully managed with linux tools so i can manipulate the traffic as i want, this is why i added this line to my gateway: ip daddr (ip nfs server) tcp dport 2049 snat to (my source nat ip) :1-1023
So after adding this, my gateway rewrite the original source port wich was: 54850 to a random port between 1-1023
Before my command tcpdump was output something like this: my-longhorn-pod-ip:54850 -> nfs_server-ip:2049 this is not working
After my command on my gateway: my-longhorn-pod-ip:1022 -> nfs_server-ip:2049 this working
Hope it’s more clear for you
I just notice when the traffic coming from the pod to the NFS server it use as source port a range of unprivileged ports (above 1023), On the k8s hosts it always use a port below 1023 as source. This is probably why the NFS server answering with Operation not permitted
…
After testing a rule on my gateway to rewrite the source port to use a port between 1-1023 ip daddr (ip nfs server) tcp dport 2049 snat to (my source nat ip) :1-1023 The mount is working 😃
I would love to know why also but there is so many ways of doing thing… Personally i use cilium as CNI without kube-proxy. So maybe the ebpf program rewrite the source port to a random higher range and this is causing the issue.
Maybe it worth writing this on the documentation here: https://longhorn.io/kb/troubleshooting-unable-to-mount-an-nfs-backup-target/