democratic-csi: TrueNAS CORE 13.0 NFS controller failing on http content-type header
I am never able to run the controller pod without the following http error:
I0717 16:40:39.078506 1 feature_gate.go:245] feature gates: &{map[]}
I0717 16:40:39.078763 1 csi-provisioner.go:139] Version: v3.1.0
I0717 16:40:39.078789 1 csi-provisioner.go:162] Building kube configs for running in cluster...
I0717 16:40:39.081207 1 connection.go:154] Connecting to unix:///csi-data/csi.sock
I0717 16:40:39.088495 1 common.go:111] Probing CSI driver for readiness
I0717 16:40:39.088718 1 connection.go:183] GRPC call: /csi.v1.Identity/Probe
I0717 16:40:39.088748 1 connection.go:184] GRPC request: {}
I0717 16:40:42.404623 1 connection.go:186] GRPC response: {}
I0717 16:40:42.404978 1 connection.go:187] GRPC error: rpc error: code = Unavailable desc = unexpected HTTP status code received from server: 502 (Bad Gateway); malformed header: missing HTTP content-type
E0717 16:40:42.405271 1 csi-provisioner.go:197] CSI driver probe failed: rpc error: code = Unavailable desc = unexpected HTTP status code received from server: 502 (Bad Gateway); malformed header: missing HTTP content-type
I’ve tried running this with the following stack:
TrueNas Core 13.0 (Up to date with all updates as of today) 10 Node cluster hosts running Ubuntu 22.04 LTE Helm (Chart included below)
I intend to run with NFS and ISCSI but have hit issue with NFS for the time being, I will attempt ISCSI if it helps debugging
Server Screenshots:
and here is my helm chart with the fun parts redacted:
csiDriver:
name: "org.democratic-csi.nfs"
storageClasses:
- name: freenas-nfs-csi
defaultClass: false
reclaimPolicy: Delete
volumeBindingMode: Immediate
allowVolumeExpansion: true
parameters:
fsType: nfs
mountOptions:
- noatime
- nfsvers=4
secrets:
provisioner-secret:
controller-publish-secret:
node-stage-secret:
node-publish-secret:
controller-expand-secret:
driver:
logLevel: debug
config:
driver: freenas-nfs
instance_id:
httpConnection:
protocol: http
host: 192.168.1.13
port: 80
apiKey: REDACTED
allowInsecure: true
apiVersion: 2
sshConnection:
host: 192.168.1.13
port: 22
username: root
privateKey: |
-----BEGIN OPENSSH PRIVATE KEY-----
REDACTED
-----END OPENSSH PRIVATE KEY-----
zfs:
datasetParentName: pool1/k8s/nfs/vols
detachedSnapshotsDatasetParentName: pool1/k8s/nfs/snaps
datasetEnableQuotas: true
datasetEnableReservation: false
datasetPermissionsMode: "0777"
datasetPermissionsUser: 0
datasetPermissionsGroup: 0
nfs:
shareHost: 192.168.1.13
shareAlldirs: false
shareAllowedHosts: []
shareAllowedNetworks: []
shareMaprootUser: root
shareMaprootGroup: wheel
shareMapallUser: ""
shareMapallGroup: ""
I’ve tried changing the following:
- API access rather than username/password ( I have 2FA and assume that wouldn’t help, though while I am testing I disabled it all together )
- Migrated from a non-root user to using the root account
- I did make sure that the created user is in the sudoers file with nopasswd and tested that it works as expected. I also did make sure the helm chart had the use sudoEnabled
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 16 (8 by maintainers)
Hey, I asked a friend to look over my setup, and he smacked me upside the head and asked to check what kernel extensions were loaded on my nodes. Turns out that I didn’t have cgroups enabled. somehow this was the first k8s pod that did resource allocation
I updated my /boot/firmware/cmdline.txt to the following:
dwc_otg.lpm_enable=0 console=serial0,115200 console=tty1 root=LABEL=writable rootfstype=ext4 elevator=deadline rootwait fixrtc quiet splash cgroup_enable=cpuset cgroup_enable=memory cgroup_memory=1 swapaccount=1after doing so I found the following logs from the csi-driver
So there was also an issue with the ssh key on the root account. If I recall correctly I just used the web-gui to make the keypair from the user account, but the error was much more obvious when I got to this point. Either way, I created a new key of type ecdsa-sha2-nistp256 and everything works now. `ssh-keygen -t ecdsa-sha2-nistp256
Turns out the out of memory makes sense, thanks for the help @travisghansen!
Welcome! Let’s start with this…
https://github.com/democratic-csi/democratic-csi/issues/306