gpu-operator: error code CUDA driver version is insufficient for CUDA runtime version in v22.9.0
The issue is still reproduced in gpu-operator v22.9.0.
kubectl --kubeconfig -n gpu logs cuda-vectoradd
Failed to allocate device vector A (error code CUDA driver version is insufficient for CUDA runtime version)!
[Vector addition of 50000 elements]
Environment infomation
OS Version: Red Hat Enterprise Linux release 8.4
kernel: 4.18.0-305.el8.x86_64
K3S Version: v1.24.3+k3s1
GPU Operator Version: v22.9.0
CUDA Version: 11.7.1-base-ubi8
Driver Pre-installed: No
Driver Version:515.65.01-rhel8.4
Container-Toolkit Pre-installed: No
Container-Toolkit Version: v1.11.0-ubi8
GPU Type: Tesla P100
cuda-sample: cuda-sample:vectoradd-cuda11.7.1-ubi8
config.toml content
cat /usr/local/nvidia/toolkit/.config/nvidia-container-runtime/config.toml
accept-nvidia-visible-devices-as-volume-mounts = false
accept-nvidia-visible-devices-envvar-when-unprivileged = true
disable-require = false
[nvidia-container-cli]
environment = []
ldconfig = "@/run/nvidia/driver/sbin/ldconfig"
load-kmods = true
path = "/usr/local/nvidia/toolkit/nvidia-container-cli"
root = "/run/nvidia/driver"
[nvidia-container-runtime]
log-level = "info"
mode = "auto"
runtimes = ["docker-runc", "runc"]
[nvidia-container-runtime.modes]
[nvidia-container-runtime.modes.csv]
mount-spec-path = "/etc/nvidia-container-runtime/host-files-for-container.d"
In the host, the /etc/nvidia-container-runtime/host-files-for-container.d
is not found.
cuda-vectoradd pod yaml
cat << EOF | kubectl --kubeconfig /work/k3s.yaml create -n hsc-gpu -f -
apiVersion: v1
kind: Pod
metadata:
name: cuda-vectoradd
spec:
restartPolicy: OnFailure
runtimeClassName: nvidia <<<<<<<<<
containers:
- name: cuda-vectoradd
image: "nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda11.7.1-ubi8"
resources:
limits:
nvidia.com/gpu: 1
EOF
when I add runtimeClassName: nvidia
in Pod spec, it works.
issue: https://github.com/NVIDIA/gpu-operator/issues/408
Dose gpu-operator support on k3s cluster environment?
@shivamerla @cdesiniotis Could you please help me out? Thank you very much.
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 19 (6 by maintainers)
I haved the same error with the Gpu Operator example but If I try with following example all works fine