longhorn: [BUG] data V2 engine displays incorrect usage in UI

Describe the bug (🐛 if you encounter this issue)

When I create 40 block type V2 PVC (per 10G), ‘Storage Schedulable (Block)’ displays incorrect used

master1 [~]# kubectl get statefulsets.apps 
NAME                READY   AGE
nginx-spdk-block    20/20   113m
nginx-spdk-block2   8/20   47m
image

To Reproduce

  1. Setup Longhorn V2
  2. Add storage block devices
  3. Create statefulset

Expected behavior

The “used” in ‘Storage Schedulable (Block)’ is displayed normally in the UI.

No impact on the normal use of Pods after over-quota.

Support bundle for troubleshooting

Environment

  • Longhorn version: 1.5.1
  • Installation method (e.g. Rancher Catalog App/Helm/Kubectl): Rancher Catalog App
  • Kubernetes distro (e.g. RKE/K3s/EKS/OpenShift) and version: K3s
    • Number of management node in the cluster: 2
    • Number of worker node in the cluster: 0
  • Node config
    • OS type and version: k3os
    • Kernel version: 5.15.0-60-generic
    • CPU per node: 16
    • Memory per node: 20
    • Disk type(e.g. SSD/NVMe/HDD): HDD
    • Network bandwidth between the nodes:
  • Underlying Infrastructure (e.g. on AWS/GCE, EKS/GKE, VMWare/KVM, Baremetal): KVM
  • Number of Longhorn volumes in the cluster: 42
  • Impacted Longhorn resources:
    • Volume names:

Additional context

I have two nodes, each node has a backend storage of 300G (one disk) that can be used by the V2 engine. The replica number of V2 SC is 2. When I create two StatefulSets, one with 20 replicas and the other with 20 replicas, both define PVCs of Block type with a size of 10G in volumeClaimTemplates. Intuitively, the used space should exceed 420G, but the actual value for used is only 57.2G.

Running Pods have always remained at more than 20

master1 [~]# kubectl get statefulsets.apps 
NAME                READY   AGE
nginx-spdk-block    10/20   119m
nginx-spdk-block2   10/20   53m

After waiting for a period of time, all V2 volumes are not healthy.

image

statefulsets as follows:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: nginx-spdk-block
spec:
  selector:
    matchLabels:
      app: nginx-spdk-block
  podManagementPolicy: Parallel
  replicas: 20
  volumeClaimTemplates:
  - metadata:
      name: html
    spec:
      volumeMode: Block
      accessModes:
        - ReadWriteOnce
      storageClassName: longhorn-v2-data-engine
      resources:
        requests:
          storage: 10Gi
  template:
    metadata:
      labels:
        app: nginx-spdk-block
    spec:
      tolerations:
        - key: node-role.kubernetes.io/master
          operator: Exists
          effect: NoSchedule
      containers:
      - name: nginx
        image: fiotest:0.1.0 # Image contains fio 
        imagePullPolicy: Always
        securityContext:
          privileged: true 
        volumeDevices:
        - devicePath: "/dev/sdd"
          name: html

About this issue

  • Original URL
  • State: closed
  • Created 9 months ago
  • Comments: 19 (11 by maintainers)

Most upvoted comments

This ticket is an error ticket, and I am a beginner. I apologize for any inconvenience caused. I misunderstood it as an “excessive usage” issue because I couldn’t create multiple V2 PVCs. Also, the UI interface displays the actual storage usage after thin-provisioned.

To summarize the issues I have encountered while using version 1.5.1 of Longhorn:

  1. nvme list segmentation fault: The current solution is to recompile instance-manager and compile and install version 1.16 of nvme-cli during image creation (the instance-manager pod needs /sys rw permissions). refer: https://github.com/longhorn/longhorn/issues/6795#issuecomment-1736624780 & https://github.com/longhorn/go-spdk-helper/commit/579110a4706cec040d05c8ff6f9433641f95c663
  2. json cannot unmarshal MaximumLBA type: The current solution is to recompile instance-manager and replace go-help-spdk in the go.mod file. refer: https://github.com/longhorn/go-spdk-helper/commit/e5fe21b6067f1adaad483b72409fe05b849c7503
  3. Unable to create multiple V2 PVCs: The solution is to set a larger hugepage memory for both the host and the instance-manager pod of longhorn to support the creation of more V2 PVCs. refer: https://github.com/longhorn/longhorn/discussions/6493#discussioncomment-6819522

I want to express my sincere thanks to @DamiaSan and @derekbit for their assistance.