democratic-csi: Odd iSCSI read performance, at or around 1MB block size (TrueNAS Core)

After figuring out the issue on #238 it seems there is an odd performance issue on iSCSI LUNs presented to OpenShift; at least on a Windows 10 VM running in OpenShift Virtualization.

The environment for the performance test is: Dell R730xd OpenShift worker node (Bare Metal), 2x 40Gbps ConnectX-3 Pro NICs in a bond (LACP)
Dell R730xd TrueNAS-13.0-U2 (Bare Metal) with 24x SAMSUNG MZILS1T9HEJH0D3 SAS SSDs in RAID10 (With no tunables set)
Arista 7050QX switch iSCSI presented to OpenShift with democratic-csi helm 0.13.5

I have tried a number of combinations of storage class configuration with inconclusive results, but always read performance degradation somewhere above 512KB block size, with “Disk Active Time” pegged at 100% and very high disk average response time (Sometimes > 6000ms), CPUs are relatively idle during this time with 4vCPUs provisioned… I have not tried MPIO with multiple targets as I know it is best practice with iSCSI links but the LACP policy is set to layer 2 hash, and is primarily there for redundancy and ease of use as TrueNAS hosts NFS as well. Additionally the MTU on the network is 1500, not 9K.

csi-perf-issue-ext4-lz4-16k-false-512

There IS THIS that looks suspicious and is already merged into OpenZFS 2.1.6:
https://github.com/openzfs/zfs/discussions/13448?sort=new
https://github.com/openzfs/zfs/pull/13452

But current TrueNAS Core (13.0-U2) currently runs (And rightfully so, as 2.1.6 just dropped 15 days ago):

truenas# zpool -V
zfs-2.1.5-1
zfs-kmod-v2022081800-zfs_27f9f911a

Adjusting zfetch_array_rd_sz seems to have no effect as PR author points out.

Then again, I have no idea is the issue is in TrueNAS, or something OpenShift does while leveraging the CSI to present storage (ext4/xfs/block) and potentially a filesystem cache that might exist in some intermediary layer… though issue persists with volumeMode: Block. VirtIO drivers are loaded in guest VM:

image

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 44 (12 by maintainers)

Most upvoted comments

This on the iX radar. I think 13.0-U3 should be out relatively soon and maybe the included 2.1.6 will help.

I’d like to chat about spdk with you if you’re willing…

Editing in an attempt at a bit more performance:

      extentDisablePhysicalBlocksize: true  # Not sure if this is required for OCP, though it is for ESXi
      extentBlocksize: 4096

image image

More testing tonight… good news and bad news:

Good news!
The read performance issue is NOT PRESENT on TrueNAS SCALE!

Bad news
TrueNAS SCALE doesn’t have nearly the performance as like-for-like driver/zvol settings as Core… but this isn’t new information and something I know iX Systems is working hard on!

Confirmed test on:
Gigabyte R272-Z32-00 (Rev. 100) TrueNAS-SCALE-22.02.4 (Bare Metal) with 12x INTEL SSDPE2KX010T8 NVMe SSDs in RAID10 (With no tunables set) image image

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  annotations:
    cdi.kubevirt.io/storage.pod.restarts: '0'
    cdi.kubevirt.io/storage.import.backingFile: '[Synology_iSCSI] win10/win10.vmdk'
    pv.kubernetes.io/bind-completed: 'yes'
    volume.beta.kubernetes.io/storage-provisioner: org.democratic-csi.scale-iscsi
    cdi.kubevirt.io/storage.import.endpoint: 'https://vcsa.localdomain/sdk'
    cdi.kubevirt.io/storage.preallocation.requested: 'false'
    cdi.kubevirt.io/storage.import.secretName: win10-vm-86001-jtxfv
    cdi.kubevirt.io/storage.import.source: vddk
    cdi.kubevirt.io/storage.import.uuid: 420184a6-1058-ca76-6a45-3f41664260e9
    cdi.kubevirt.io/storage.condition.running.message: 'Import Complete; VDDK: {"Version":"7.0.3","Host":""}'
    cdi.kubevirt.io/storage.pod.retainAfterCompletion: 'true'
    cdi.kubevirt.io/storage.condition.running: 'false'
    cdi.kubevirt.io/storage.import.importPodName: importer-win10-vm-86001-55j8j
    pv.kubernetes.io/bound-by-controller: 'yes'
    volume.kubernetes.io/storage-provisioner: org.democratic-csi.scale-iscsi
    cdi.kubevirt.io/storage.pod.phase: Succeeded
    cdi.kubevirt.io/storage.condition.running.reason: Completed
    cdi.kubevirt.io/storage.import.vddk.thumbprint: '07:62:CF:6E:5E:CB:B7:CC:3F:7D:CC:AC:79:1B:EE:57:2A:DC:64:AA'
    cdi.kubevirt.io/storage.pod.vddk.initimageurl: >-
      image-registry.openshift-image-registry.svc.cluster.local:5000/benchmark/vddk:7032
    cdi.kubevirt.io/storage.pod.vddk.version: 7.0.3
  resourceVersion: '133548380'
  name: win10-vm-86001-55j8j
  uid: 4b3f222e-1c0c-4365-acee-8fe6f61a7902
  creationTimestamp: '2022-10-19T02:37:01Z'
  ...
  namespace: benchmark
  ownerReferences:
    - apiVersion: cdi.kubevirt.io/v1beta1
      kind: DataVolume
      name: win10-vm-86001-55j8j
      uid: 4e9da7af-4b40-43be-96d8-fea49de9f405
      controller: true
      blockOwnerDeletion: true
  finalizers:
    - kubernetes.io/pvc-protection
  labels:
    app: containerized-data-importer
    app.kubernetes.io/component: storage
    app.kubernetes.io/managed-by: cdi-controller
    app.kubernetes.io/part-of: hyperconverged-cluster
    app.kubernetes.io/version: 4.11.0
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: '68719476736'
  volumeName: pvc-4b3f222e-1c0c-4365-acee-8fe6f61a7902
  storageClassName: truenas-scale-iscsi
  volumeMode: Block
status:
  phase: Bound
  accessModes:
    - ReadWriteOnce
  capacity:
    storage: 64Gi

StorageProfile in openshift-cnv namespace, volumeMode: Block makes XFS irrelevant on Migration Toolkit import (As seen in PVC above).

apiVersion: cdi.kubevirt.io/v1beta1
kind: StorageProfile
metadata:
  creationTimestamp: '2022-10-19T02:02:10Z'
  generation: 3
  labels:
    app: containerized-data-importer
    app.kubernetes.io/component: storage
    app.kubernetes.io/managed-by: cdi-controller
    app.kubernetes.io/part-of: hyperconverged-cluster
    app.kubernetes.io/version: 4.11.0
    cdi.kubevirt.io: ''
  ...
      manager: virt-cdi-controller
      operation: Update
      time: '2022-10-19T02:03:08Z'
  name: truenas-scale-iscsi
  ownerReferences:
    - apiVersion: cdi.kubevirt.io/v1beta1
      blockOwnerDeletion: true
      controller: true
      kind: CDI
      name: cdi-kubevirt-hyperconverged
      uid: 093a98f3-ba0d-4b56-bd4b-18d645e2bc2a
  resourceVersion: '133415837'
  uid: 7e5d583e-6afe-4f3d-9a4f-f35a08eb36f1
spec:
  claimPropertySets:
    - accessModes:
        - ReadWriteOnce
      volumeMode: Block
status:
  claimPropertySets:
    - accessModes:
        - ReadWriteOnce
      volumeMode: Block
  provisioner: org.democratic-csi.scale-iscsi
  storageClass: truenas-scale-iscsi

Helm Command

[dave@lenovo ocp]$ helm upgrade --install --create-namespace --values truenas-scale-iscsi.yaml --namespace democratic-csi --set node.rbac.openshift.privileged=true --set node.driver.localtimeHostPath=false --set controller.rbac.openshift.privileged=true truenas-scale-iscsi democratic-csi/democratic-csi

truenas-scale-iscsi.yaml (Notice using SSH driver, not API driver)

csiDriver:
  name: "org.democratic-csi.scale-iscsi"

storageClasses:
- name: truenas-scale-iscsi
  defaultClass: false
  reclaimPolicy: Delete
  volumeBindingMode: Immediate
  allowVolumeExpansion: true
  parameters:
    fsType: xfs
  mountOptions: []
  secrets:
    provisioner-secret:
    controller-publish-secret:
    node-stage-secret:
    node-publish-secret:
    controller-expand-secret:


driver:
  config:
    driver: freenas-iscsi   # freenas-api-iscsi
    instance_id:
    httpConnection:
      protocol: http
      host: 172.16.x.x
      port: 80
      username: root
      password: *********
      allowInsecure: true
    sshConnection:
      host: 172.16.x.x
      port: 22
      username: root
      password: **********
    zfs:
      datasetParentName: tank/k8s/iscsi/v
      detachedSnapshotsDatasetParentName: tank/k8s/iscsi/s
      zvolCompression: lz4
      zvolDedup: off
      zvolEnableReservation: false
      zvolBlocksize: 16K
    iscsi:
      targetPortal: "172.16.x.x:3260"
      targetPortals: []
      # leave empty to omit usage of -I with iscsiadm
      interface:
      namePrefix: csi-
      nameSuffix: "-cluster"
      targetGroups:
        - targetGroupPortalGroup: 1
          targetGroupInitiatorGroup: 1
          targetGroupAuthType: None
      extentInsecureTpc: true
      extentXenCompat: false
      extentDisablePhysicalBlocksize: false
      extentBlocksize: 512
      extentRpm: "SSD"
      extentAvailThreshold: 0

image image

[dave@lenovo kubestr_0.4.35_Linux_amd64]$ ./kubestr fio -s freenas-iscsi-csi
PVC created kubestr-fio-pvc-rqs6z
W1018 12:58:05.664722  375380 warnings.go:70] would violate PodSecurity "restricted:latest": allowPrivilegeEscalation != false (container "kubestr-fio" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "kubestr-fio" must set securityContext.capabilities.drop=["ALL"]), runAsNonRoot != true (pod or container "kubestr-fio" must set securityContext.runAsNonRoot=true), seccompProfile (pod or container "kubestr-fio" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
Pod created kubestr-fio-pod-v8xcx
Running FIO test (default-fio) on StorageClass (freenas-iscsi-csi) with a PVC of Size (100Gi)
Elapsed time- 23.39038057s
FIO test results:
  
FIO version - fio-3.30
Global options - ioengine=libaio verify=0 direct=1 gtod_reduce=1

JobName: read_iops
  blocksize=4K filesize=2G iodepth=64 rw=randread
read:
  IOPS=4571.409668 BW(KiB/s)=18302
  iops: min=2799 max=6208 avg=4578.633301
  bw(KiB/s): min=11198 max=24832 avg=18314.800781

JobName: write_iops
  blocksize=4K filesize=2G iodepth=64 rw=randwrite
write:
  IOPS=2264.954346 BW(KiB/s)=9076
  iops: min=1084 max=3108 avg=2266.500000
  bw(KiB/s): min=4336 max=12432 avg=9066.133789

JobName: read_bw
  blocksize=128K filesize=2G iodepth=64 rw=randread
read:
  IOPS=5471.136719 BW(KiB/s)=700842
  iops: min=2942 max=7154 avg=5476.666504
  bw(KiB/s): min=376576 max=915712 avg=701023.187500

JobName: write_bw
  blocksize=128k filesize=2G iodepth=64 rw=randwrite
write:
  IOPS=2783.699219 BW(KiB/s)=356850
  iops: min=1592 max=3666 avg=2787.833252
  bw(KiB/s): min=203776 max=469248 avg=356846.281250

Disk stats (read/write):
  sdc: ios=170037/85361 merge=2509/1338 ticks=1900170/1892015 in_queue=3792185, util=99.450226%
  -  OK

Need higher block size…

Final Throughput: image

Final IOPS (Ouch 0 IOPS!!! But at least the bench finished!): image

@travisghansen thank you very much, happy to facilitate them getting hands on with the environment… it is very stock latest Core box and OCP 4.11.

Is there a way to just set fsType to just zfs? Unless I’m misunderstanding something, why must there be an intermediary file system?

That fs only kicks in if the volume access mode in k8s is filesystem vs block. If the volume access mode is block it will be ignored entirely.