democratic-csi: TrueNAS CORE 13.0 NFS controller failing on http content-type header

I am never able to run the controller pod without the following http error:

I0717 16:40:39.078506       1 feature_gate.go:245] feature gates: &{map[]}
I0717 16:40:39.078763       1 csi-provisioner.go:139] Version: v3.1.0
I0717 16:40:39.078789       1 csi-provisioner.go:162] Building kube configs for running in cluster...
I0717 16:40:39.081207       1 connection.go:154] Connecting to unix:///csi-data/csi.sock
I0717 16:40:39.088495       1 common.go:111] Probing CSI driver for readiness
I0717 16:40:39.088718       1 connection.go:183] GRPC call: /csi.v1.Identity/Probe
I0717 16:40:39.088748       1 connection.go:184] GRPC request: {}
I0717 16:40:42.404623       1 connection.go:186] GRPC response: {}
I0717 16:40:42.404978       1 connection.go:187] GRPC error: rpc error: code = Unavailable desc = unexpected HTTP status code received from server: 502 (Bad Gateway); malformed header: missing HTTP content-type
E0717 16:40:42.405271       1 csi-provisioner.go:197] CSI driver probe failed: rpc error: code = Unavailable desc = unexpected HTTP status code received from server: 502 (Bad Gateway); malformed header: missing HTTP content-type

I’ve tried running this with the following stack:

TrueNas Core 13.0 (Up to date with all updates as of today) 10 Node cluster hosts running Ubuntu 22.04 LTE Helm (Chart included below)

I intend to run with NFS and ISCSI but have hit issue with NFS for the time being, I will attempt ISCSI if it helps debugging

Server Screenshots: image image image

and here is my helm chart with the fun parts redacted:

csiDriver:
  name: "org.democratic-csi.nfs"

storageClasses:
- name: freenas-nfs-csi
  defaultClass: false
  reclaimPolicy: Delete
  volumeBindingMode: Immediate
  allowVolumeExpansion: true
  parameters:
    fsType: nfs

  mountOptions:
  - noatime
  - nfsvers=4
  secrets:
    provisioner-secret:
    controller-publish-secret:
    node-stage-secret:
    node-publish-secret:
    controller-expand-secret:

driver:
  logLevel: debug
  config:
    driver: freenas-nfs
    instance_id:
    httpConnection:
      protocol: http
      host: 192.168.1.13
      port: 80
      apiKey: REDACTED
      allowInsecure: true
      apiVersion: 2
    sshConnection:
      host: 192.168.1.13
      port: 22
      username: root
      privateKey: |
        -----BEGIN OPENSSH PRIVATE KEY-----
        REDACTED
        -----END OPENSSH PRIVATE KEY-----
    zfs:
      datasetParentName: pool1/k8s/nfs/vols
      detachedSnapshotsDatasetParentName: pool1/k8s/nfs/snaps
      datasetEnableQuotas: true
      datasetEnableReservation: false
      datasetPermissionsMode: "0777"
      datasetPermissionsUser: 0
      datasetPermissionsGroup: 0
    nfs:
      shareHost: 192.168.1.13
      shareAlldirs: false
      shareAllowedHosts: []
      shareAllowedNetworks: []
      shareMaprootUser: root
      shareMaprootGroup: wheel
      shareMapallUser: ""
      shareMapallGroup: ""

I’ve tried changing the following:

  • API access rather than username/password ( I have 2FA and assume that wouldn’t help, though while I am testing I disabled it all together )
  • Migrated from a non-root user to using the root account
    • I did make sure that the created user is in the sudoers file with nopasswd and tested that it works as expected. I also did make sure the helm chart had the use sudoEnabled

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 16 (8 by maintainers)

Most upvoted comments

Hey, I asked a friend to look over my setup, and he smacked me upside the head and asked to check what kernel extensions were loaded on my nodes. Turns out that I didn’t have cgroups enabled. somehow this was the first k8s pod that did resource allocation

I updated my /boot/firmware/cmdline.txt to the following: dwc_otg.lpm_enable=0 console=serial0,115200 console=tty1 root=LABEL=writable rootfstype=ext4 elevator=deadline rootwait fixrtc quiet splash cgroup_enable=cpuset cgroup_enable=memory cgroup_memory=1 swapaccount=1

after doing so I found the following logs from the csi-driver

$ kubectl logs zfs-nfs-democratic-csi-controller-cc4889f59-m7r9b -n democratic-csi -c csi-driver
grpc implementation: @grpc/grpc-js
{"host":"zfs-nfs-democratic-csi-controller-cc4889f59-m7r9b","level":"info","message":"initializing csi driver: freenas-nfs","service":"democratic-csi","timestamp":"2023-07-17T20:20:19.439Z"}
{"host":"zfs-nfs-democratic-csi-controller-cc4889f59-m7r9b","level":"info","message":"starting csi server - node version: v16.18.0, package version: 1.8.3, config file: /config/..2023_07_17_20_19_31
.1475505710/driver-config-file.yaml, csi-name: org.democratic-csi.nfs, csi-driver: freenas-nfs, csi-mode: controller, csi-version: 1.5.0, address: , socket: unix:///csi-data/csi.sock.internal","serv
ice":"democratic-csi","timestamp":"2023-07-17T20:20:21.320Z"}
{"host":"zfs-nfs-democratic-csi-controller-cc4889f59-m7r9b","level":"info","message":"new request - driver: FreeNASSshDriver method: Probe call: {\"metadata\":{\"user-agent\":[\"grpc-go/1.40.0\"],\"
x-forwarded-host\":[\"localhost\"]},\"request\":{},\"cancelled\":false}","service":"democratic-csi","timestamp":"2023-07-17T20:20:31.543Z"}
{"date":"Mon Jul 17 2023 20:20:31 GMT+0000 (Coordinated Universal Time)","error":{"level":"client-authentication"},"exception":true,"host":"zfs-nfs-democratic-csi-controller-cc4889f59-m7r9b","level"
:"error","message":"uncaughtException: All configured authentication methods failed\nError: All configured authentication methods failed\n    at doNextAuth (/home/csi/app/node_modules/ssh2/lib/clien
t.js:803:21)\n    at tryNextAuth (/home/csi/app/node_modules/ssh2/lib/client.js:993:7)\n    at USERAUTH_FAILURE (/home/csi/app/node_modules/ssh2/lib/client.js:373:11)\n    at 51 (/home/csi/app/node_
modules/ssh2/lib/protocol/handlers.misc.js:337:16)\n    at Protocol.onPayload (/home/csi/app/node_modules/ssh2/lib/protocol/Protocol.js:2025:10)\n    at AESGCMDecipherNative.decrypt (/home/csi/app/n
ode_modules/ssh2/lib/protocol/crypto.js:987:26)\n    at Protocol.parsePacket [as _parse] (/home/csi/app/node_modules/ssh2/lib/protocol/Protocol.js:1994:25)\n    at Protocol.parse (/home/csi/app/node
_modules/ssh2/lib/protocol/Protocol.js:293:16)\n    at Socket.<anonymous> (/home/csi/app/node_modules/ssh2/lib/client.js:713:21)\n    at Socket.emit (node:events:513:28)","os":{"loadavg":[0.73,0.45,
0.39],"uptime":4075.4},"process":{"argv":["/usr/local/bin/node","/home/csi/app/bin/democratic-csi","--csi-version=1.5.0","--csi-name=org.democratic-csi.nfs","--driver-config-file=/config/driver-conf
ig-file.yaml","--log-level=info","--csi-mode=controller","--server-socket=/csi-data/csi.sock.internal"],"cwd":"/home/csi/app","execPath":"/usr/local/bin/node","gid":0,"memoryUsage":{"arrayBuffers":1
25044,"external":26554091,"heapTotal":36655104,"heapUsed":32441816,"rss":89939968},"pid":1,"uid":0,"version":"v16.18.0"},"service":"democratic-csi","stack":"Error: All configured authentication meth
ods failed\n    at doNextAuth (/home/csi/app/node_modules/ssh2/lib/client.js:803:21)\n    at tryNextAuth (/home/csi/app/node_modules/ssh2/lib/client.js:993:7)\n    at USERAUTH_FAILURE (/home/csi/app
/node_modules/ssh2/lib/client.js:373:11)\n    at 51 (/home/csi/app/node_modules/ssh2/lib/protocol/handlers.misc.js:337:16)\n    at Protocol.onPayload (/home/csi/app/node_modules/ssh2/lib/protocol/Pr
otocol.js:2025:10)\n    at AESGCMDecipherNative.decrypt (/home/csi/app/node_modules/ssh2/lib/protocol/crypto.js:987:26)\n    at Protocol.parsePacket [as _parse] (/home/csi/app/node_modules/ssh2/lib/
protocol/Protocol.js:1994:25)\n    at Protocol.parse (/home/csi/app/node_modules/ssh2/lib/protocol/Protocol.js:293:16)\n    at Socket.<anonymous> (/home/csi/app/node_modules/ssh2/lib/client.js:713:2
1)\n    at Socket.emit (node:events:513:28)","timestamp":"2023-07-17T20:20:31.680Z","trace":[{"column":21,"file":"/home/csi/app/node_modules/ssh2/lib/client.js","function":"doNextAuth","line":803,"m
ethod":null,"native":false},{"column":7,"file":"/home/csi/app/node_modules/ssh2/lib/client.js","function":"tryNextAuth","line":993,"method":null,"native":false},{"column":11,"file":"/home/csi/app/no
de_modules/ssh2/lib/client.js","function":"USERAUTH_FAILURE","line":373,"method":null,"native":false},{"column":16,"file":"/home/csi/app/node_modules/ssh2/lib/protocol/handlers.misc.js","function":"
51","line":337,"method":null,"native":false},{"column":10,"file":"/home/csi/app/node_modules/ssh2/lib/protocol/Protocol.js","function":"Protocol.onPayload","line":2025,"method":"onPayload","native":
false},{"column":26,"file":"/home/csi/app/node_modules/ssh2/lib/protocol/crypto.js","function":"AESGCMDecipherNative.decrypt","line":987,"method":"decrypt","native":false},{"column":25,"file":"/home
/csi/app/node_modules/ssh2/lib/protocol/Protocol.js","function":"Protocol.parsePacket [as _parse]","line":1994,"method":"parsePacket [as _parse]","native":false},{"column":16,"file":"/home/csi/app/n
ode_modules/ssh2/lib/protocol/Protocol.js","function":"Protocol.parse","line":293,"method":"parse","native":false},{"column":21,"file":"/home/csi/app/node_modules/ssh2/lib/client.js","function":null
,"line":713,"method":null,"native":false},{"column":28,"file":"node:events","function":"Socket.emit","line":513,"method":"emit","native":false}]}

So there was also an issue with the ssh key on the root account. If I recall correctly I just used the web-gui to make the keypair from the user account, but the error was much more obvious when I got to this point. Either way, I created a new key of type ecdsa-sha2-nistp256 and everything works now. `ssh-keygen -t ecdsa-sha2-nistp256

Turns out the out of memory makes sense, thanks for the help @travisghansen!