longhorn: [BUG] Longhorn does not launch on systems with a non-standard path

Describe the bug Linux distributions such as NixOS use a very non-standard path (e.g /run/wrappers/bin:/root/.nix-profile/bin:/etc/profiles/per-user/root/bin:/nix/var/nix/profiles/default/bin:/run/current-system/sw/bi) - in NixOS’s case, this is to allow for use of the Nix package manager.

On these systems, because Longhorn uses nsenter to enter the host namespace, and nsenter uses the path of the namespaces that called it, basic system utilities such as mount, as well as iscsiadm can’t be found or called (produces Failed to execute: nsenter [–mount=/host/proc/*/ns/mnt mount], output nsenter: failed to execute mount: No such file or directory and Failed environment check, please make sure you have iscsiadm/open-iscsi installed on the host during initial startup of the longhorn-manager pod).

I have built some extremely scrappy derived images based on the current stable release of Longhorn at https://github.com/duckfullstop/nixos-longhorn - these simply append the necessary NixOS-specific path entries to the container’s path, such that nsenter can find the requisite binaries properly. This is not a complete fix - there is, for some reason, a hardcoded lookup path in longhorn-manager/csi/nfs/nsenter.go that means the share-manager pod cannot properly instantiate ReadWriteMany Persistent Volumes. However, even without patching that, the containers with the updated path work perfectly, and Longhorn works flawlessly (excepting the ReadWriteMany issue).

The affected Longhorn components are longhorn-manager and longhorn-instance-manager, as these are the only two images that make nsenter calls.

I’m not 100% sure what the best and cleanest way to resolve this is, so I’m opening this as an issue instead of a load of PRs.

To Reproduce

  1. Bootstrap a NixOS Kubernetes cluster (see https://nixos.wiki/wiki/Kubernetes for some vague instruction, I have incantations that I’m using that I can post somewhere eventually but a basic setup will experience the same issue)
  2. Ensure that openiscsi is available in the host environment with config.environment.systemPackages = [ pkgs.openiscsi ];
  3. Deploy the current Longhorn manifest to the cluster
  4. Be sad when longhorn-manager fails to come up

Expected behavior longhorn-manager proceeds with initialisation.

Log

During startup, if not using containers with patched $PATH (log from longhorn-manager):

Failed environment check, please make sure you have iscsiadm/open-iscsi installed on the host
Failed to execute: nsenter [–mount=/host/proc/*/ns/mnt mount], output nsenter: failed to execute mount: No such file or directory

When attempting to use a ReadWriteMany Persistent Volume Claim with patched $PATH (log from longhorn-csi-plugin/longhorn-csi-plugin):

level=error msg="GRPC error: rpc error: code = Internal desc = Failed to create nsenter executor, err: unable to find required binary mount on host"

Environment:

  • Longhorn version: v1.1.0 / v1_20201216
  • Kubernetes distro (e.g. RKE/K3s/EKS/OpenShift) and version: NixOS v1.20.1 (custom derivation)
    • Number of management node in the cluster: 1
    • Number of worker node in the cluster: 2
  • Node config
    • OS type and version: NixOS 20.09
    • CPU per node: 8
    • Memory per node: 16G
    • Disk type(e.g. SSD/NVMe): SSD
    • Network bandwidth between the nodes: 1GB/s
  • Underlying Infrastructure (e.g. on AWS/GCE, EKS/GKE, VMWare/KVM, Baremetal): Baremetal

Additional context The cleanest way that I can personally think of to resolve this one is to add a variable to longhorn-manager and longhorn-instance-manager that allows for appending to the system path, and to update csi/nfs/nsenter.go to check against this path instead of the hardcoded choices.

About this issue

  • Original URL
  • State: open
  • Created 3 years ago
  • Reactions: 28
  • Comments: 18 (3 by maintainers)

Commits related to this issue

Most upvoted comments

Updated helm file example:

repositories:
  - name: longhorn
    url: https://charts.longhorn.io
  - name: kyverno
    url: https://kyverno.github.io/kyverno
  - name: incubator
    url: https://charts.helm.sh/incubator
releases:
  - name: longhorn
    namespace: longhorn-system
    chart: longhorn/longhorn
    version: 1.3.2
  - name: kyverno
    namespace: kyverno
    chart: kyverno/kyverno
    version: 2.6.1
  - name: longhorn-admission-hooks
    namespace: longhorn-system
    chart: incubator/raw
    needs:
      - kyverno/kyverno
    values:
      - resources:
          - apiVersion: v1
            kind: ConfigMap
            metadata:
              name: longhorn-custom-path
              namespace: longhorn-system
            data:
              PATH: /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/run/wrappers/bin:/nix/var/nix/profiles/default/bin:/run/current-system/sw/bin
          - apiVersion: kyverno.io/v1
            kind: ClusterPolicy
            metadata:
              name: add-host-path-to-longhorn
              annotations:
                policies.kyverno.io/title: Add Environment Variables from ConfigMap
                policies.kyverno.io/subject: Pod
                policies.kyverno.io/category: Other
                policies.kyverno.io/description: >-
                  Longhorn invokes executables on the host system, and needs
                  to be aware of the host systems PATH. This modifies all
                  deployments such that the PATH is explicitly set to support
                  NixOS based systems.
            spec:
              rules:
                - name: add-env-vars
                  match:
                    resources:
                      kinds:
                        - Pod
                      namespaces:
                        - longhorn-system
                  mutate:
                    patchStrategicMerge:
                      spec:
                        initContainers:
                          - (name): "*"
                            envFrom:
                              - configMapRef:
                                  name: longhorn-custom-path
                        containers:
                          - (name): "*"
                            envFrom:
                              - configMapRef:
                                  name: longhorn-custom-path

I’m currently using this as a workaround:

systemd.tmpfiles.rules = [
  "L+ /usr/local/bin - - - - /run/current-system/sw/bin/"
];

Given that /usr/local/bin is not part of NixOS’ default PATH, I reckon there are no side effects.

We currently do not have the capacity to test against non standard operating systems, for a list of our recommended operating systems have a look here: https://longhorn.io/docs/1.1.0/best-practices/#software

We do want to have another look at this issue once we are closer to the 1.2 release.

I’ll happily take a look into putting some PRs together! Should have some time to delve it at the end of this week, otherwise hopefully next week.

Will we receive a complete solution for this issue, or will the workaround(helm) remain semi-permanent?

For me it was two things for nixos. The most important was @joaojacome 's input but I also was missing openiscsi on the nodes. So every node that is in my cluster now has this additional configuration

systemd.tmpfiles.rules = [
  "L+ /usr/local/bin - - - - /run/current-system/sw/bin/"
];

services.openiscsi = {
  enable = true;
  name = "<some-name>";
};

So far this seems to work. Will post updates if I discover more. But this also seems to point at @gnufied s question that nsenter is used for iscsiadm at least that was what my logs suggested as well

Does anyone have a working version of the above for kustomize? And of course having this issue fixed instead of a workaround would be a major improvement