kata-containers: NVIDIA GPU passthrough not working, QMP fails
Host
OS: Ubuntu 20.04
Kernel: 5.4.0-89-generic
Kata version (also tested with 2.2.x version)
$ kata-runtime -v
kata-runtime : 2.3.0-alpha2
commit : <<unknown>>
OCI specs: 1.0.2-dev
containerd version
$ ctr -v
ctr containerd.io 1.4.11
Simple Kata containers run without hardware passthrough
$ image="docker.io/library/busybox:latest"
$ sudo ctr run --runtime "io.containerd.kata.v2" --rm -t "$image" test-kata uname -r
5.10.25-nvidia-gpu
IOMMU Group with PCIe bridges and a GPU
The GPU is bound to the VFIO driver via kernel command flag since most of the distributions are not building VFIO as a module anymore.
IOMMU Group 66:
3c:01.0 PCI bridge [0604]: PCIe Bridge [13ac:1178] (rev 01)
3e:00.0 PCI bridge [0604]: PCIe Bridge [13ac:1178] (rev 01)
3f:08.0 PCI bridge [0604]: PCIe Bridge [13ac:1178] (rev 01)
40:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:0fbb]
Kata with IOMMU Group 66
$ sudo ctr run --runtime "io.containerd.kata.v2" --rm -t --device /dev/vfio/66 "$image" test-kata uname -r
ctr: QMP command failed: Device 'vfio-bf35ce81cd1f15c20' not found: not found
Followed this guide https://github.com/kata-containers/kata-containers/blob/main/docs/use-cases/Nvidia-GPU-passthrough-and-Kata.md and build the kernel and tried out also various configuration settings in the config file (machine type, etc…) nothing worked.
In Kata 2.2.0 I got the error message that one of the PCI bridges could not be found rather then the vfio-xxxxx device.
Is Kata trying to use all of the devices in an IOMMU Group?
Does Kata make a differentiation between PCI root, PCI bridges, and the actual devices in an IOMMU group?
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 24 (24 by maintainers)
Commits related to this issue
- runtime: add delay to wait for the vfio device link up fixes: #2938 — committed to Bevisy/kata-containers by Bevisy 2 years ago
- runtime: add delay to wait for the vfio device link up fixes: #2938 — committed to Bevisy/kata-containers by Bevisy 2 years ago
- runtime: add delay to wait for vm ready to hotplug fixes: #2938 Signed-off-by: Binbin Zhang <binbin36520@gmail.com> — committed to Bevisy/kata-containers by Bevisy 2 years ago
- runtime: Increase the latency and wait for the VM to be ready for hotplug. Increase the latency and wait for the VM to be ready for hotplug fixes: #2938 Signed-off-by: Binbin Zhang <binbin36520@gma... — committed to Bevisy/kata-containers by Bevisy 2 years ago
- runtime: Increase the latency and wait for the VM to be ready for hotplug. Increase the latency and wait for the VM to be ready for hotplug fixes: #2938 Signed-off-by: Binbin Zhang <binbin36520@gma... — committed to Bevisy/kata-containers by Bevisy 2 years ago
- runtime: Increase the latency and wait for the VM to be ready for hotplug. Increase the latency and wait for the VM to be ready for hotplug fixes: #2938 Signed-off-by: Binbin Zhang <binbin36520@gma... — committed to Bevisy/kata-containers by Bevisy 2 years ago
- runtime: Add heuristic to get the right values for mem-reserver and pref-64-reserver Fixes: #2938 Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com> — committed to zvonkok/kata-containers by zvonkok 2 years ago
- runtime: Add heuristic to get the right values for mem-reserver and pref-64-reserver Fixes: #2938 Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com> — committed to zvonkok/kata-containers by zvonkok 2 years ago
- runtime: Add heuristic to get the right values for mem-reserver and pref-64-reserver Fixes: #2938 Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com> — committed to zvonkok/kata-containers by zvonkok 2 years ago
- runtime: Add heuristic to get the right value(s) for mem-reserve Fixes: #2938 Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com> — committed to zvonkok/kata-containers by zvonkok 2 years ago
- runtime: Add heuristic to get the right value(s) for mem-reserve Fixes: #2938 Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com> — committed to zvonkok/kata-containers by zvonkok 2 years ago
- runtime: Add heuristic to get the right value(s) for mem-reserve Fixes: #2938 Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com> — committed to zvonkok/kata-containers by zvonkok 2 years ago
- runtime: Add heuristic to get the right value(s) for mem-reserve Fixes: #2938 Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com> — committed to zvonkok/kata-containers by zvonkok 2 years ago
@fidencio Yes, on it. Will report.