longhorn: [BUG] Backup NFS - Operation not permitted during mount

Describe the bug (🐛 if you encounter this issue)

Setting backupTarget to an NFS store is giving an Operation not permitted error in the UI

I am using a Synology NAS that only supports up to v4.1

# cat /proc/fs/nfsd/versions
+2 +3 +4 +4.1
ash-4.4# cat /etc/exports
/volume2/k8s-backup	10.0.80.0/21(rw,async,no_wdelay,no_root_squash,insecure_locks,sec=sys,anonuid=1025,anongid=100)

My kubernetes nodes are within the subnet above

To Reproduce

Manually exec the mount command from within a longhorn-manager:

$ mkdir -p /mnt/nfs
$ mount -t nfs4 -o nfsvers=4.1,actimeo=1,soft,timeo=300,retry=2 <nas url>:/volume2/k8s-backup /mnt/nfs
mount.nfs4: Operation not permitted

Executing the same command directly on a k8s node works fine:

$ sudo mount -t nfs4 -o nfsvers=4.1,actimeo=1,soft,timeo=300,retry=2 <nas url>:/volume2/k8s-backup /tmp/nas
$

Expected behavior

NFS should mount

Log or Support bundle

If applicable, add the Longhorn managers’ log or support bundle when the issue happens. You can generate a Support Bundle using the link at the footer of the Longhorn UI.

Environment

  • Longhorn version: 1.4.2
  • Installation method (e.g. Rancher Catalog App/Helm/Kubectl): Helm / Flux
  • Kubernetes distro (e.g. RKE/K3s/EKS/OpenShift) and version: K3s
    • Number of management node in the cluster: 1
    • Number of worker node in the cluster: 4
  • Node config
    • OS type and version: Ubuntu 22.04
    • CPU per node: 2
    • Memory per node: 30GB
    • Disk type(e.g. SSD/NVMe): NVMe
    • Network bandwidth between the nodes: 1Gbe
  • Underlying Infrastructure (e.g. on AWS/GCE, EKS/GKE, VMWare/KVM, Baremetal): Proxmox
  • Number of Longhorn volumes in the cluster: N/A

Additional context

Add any other context about the problem here.

About this issue

  • Original URL
  • State: open
  • Created a year ago
  • Reactions: 2
  • Comments: 15 (6 by maintainers)

Most upvoted comments

Hello @dotdiego yes of course, so basically your NFS server expecting your client to have a source port to be in the privileged port range, in simple term it must be under 1024.

So you must somehow find a way to make sure your client (in this case, the longhorn pod) is making a nfs query with a source port between 1-1023.

In my case, my kubernetes cluster need to go through my gateway server to reach the NFS, good news for me it’s fully managed with linux tools so i can manipulate the traffic as i want, this is why i added this line to my gateway: ip daddr (ip nfs server) tcp dport 2049 snat to (my source nat ip) :1-1023

So after adding this, my gateway rewrite the original source port wich was: 54850 to a random port between 1-1023

Before my command tcpdump was output something like this: my-longhorn-pod-ip:54850 -> nfs_server-ip:2049 this is not working

After my command on my gateway: my-longhorn-pod-ip:1022 -> nfs_server-ip:2049 this working

Hope it’s more clear for you

I just notice when the traffic coming from the pod to the NFS server it use as source port a range of unprivileged ports (above 1023), On the k8s hosts it always use a port below 1023 as source. This is probably why the NFS server answering with Operation not permitted

After testing a rule on my gateway to rewrite the source port to use a port between 1-1023 ip daddr (ip nfs server) tcp dport 2049 snat to (my source nat ip) :1-1023 The mount is working 😃

I just notice when the traffic coming from the pod to the NFS server it use as source port a range of unprivileged ports (above 1023), On the k8s hosts it always use a port below 1023 as source. This is probably why the NFS server answering with Operation not permitted … After testing a rule on my gateway to rewrite the source port to use a port between 1-1023 ip daddr (ip nfs server) tcp dport 2049 snat to (my source nat ip) :1-1023 The mount is working 😃

@ozid Cool! We were not aware that it is caused by the port usage in the k8s system. However, why does it work in most environments without the issue?

I would love to know why also but there is so many ways of doing thing… Personally i use cilium as CNI without kube-proxy. So maybe the ebpf program rewrite the source port to a random higher range and this is causing the issue.

Maybe it worth writing this on the documentation here: https://longhorn.io/kb/troubleshooting-unable-to-mount-an-nfs-backup-target/