gpu-operator: Centos 7. nvidia-driver pod "Could not resolve Linux kernel version"
1. Quick Debug Checklist
- Are you running on an Ubuntu 18.04 node? No. Centos 7.8
- Are you running Kubernetes v1.13+? v1.18
- Are you running Docker (>= 18.06) or CRIO (>= 1.13+)? Docker 20.10.3
- Do you have
i2c_core
andipmi_msghandler
loaded on the nodes? - Did you apply the CRD (
kubectl describe clusterpolicies --all-namespaces
)
1. Issue or feature description
I’ve error while nvidia-driver pod try to install driver on centos 7. this log is
========== NVIDIA Software Installer ==========
Starting installation of NVIDIA driver version 450.80.02 for Linux kernel version 3.10.0-862.el7.x86_64
Stopping NVIDIA persistence daemon...
Unloading NVIDIA driver kernel modules...
Unmounting NVIDIA driver rootfs...
Checking NVIDIA driver packages...
Updating the package cache...
Resolving Linux kernel version...
Unable to open the file '/lib/modules/3.10.0-862.el7.x86_64/proc/version' (No such file or directory).Could not resolve Linux kernel version
Stopping NVIDIA persistence daemon...
Unloading NVIDIA driver kernel modules...
Unmounting NVIDIA driver rootfs...
I see it same error in #97. but i try to disable nouveau with following it not resolve. I’ve used gpu-operator v1.5.2. Please help me resolve this error. thanks.
About this issue
- Original URL
- State: open
- Created 3 years ago
- Reactions: 7
- Comments: 18 (4 by maintainers)
My cluster runs on Centos 7.6 with upgraded kernel
4.19.12-1.el7
,replace
kernel
tokernel-ml
innvidia-docker
and re-build the image, by using modified imagenvcr.io/nvidia/mldriver:460.32.03-centos7
I could getnvidia-driver-daemonset
working.After building image you have to manually replace tags in
values.yml
.