kata-containers: NVIDIA GPU passthrough not working, QMP fails

Host

OS: Ubuntu 20.04
Kernel: 5.4.0-89-generic

Kata version (also tested with 2.2.x version)

$ kata-runtime -v
kata-runtime  : 2.3.0-alpha2
   commit   : <<unknown>>
   OCI specs: 1.0.2-dev

containerd version

$ ctr -v
ctr containerd.io 1.4.11

Simple Kata containers run without hardware passthrough

$ image="docker.io/library/busybox:latest"
$ sudo ctr run --runtime "io.containerd.kata.v2" --rm -t "$image" test-kata uname -r
5.10.25-nvidia-gpu

IOMMU Group with PCIe bridges and a GPU

The GPU is bound to the VFIO driver via kernel command flag since most of the distributions are not building VFIO as a module anymore.

IOMMU Group 66:
	3c:01.0 PCI bridge [0604]:  PCIe Bridge  [13ac:1178] (rev 01)
	3e:00.0 PCI bridge [0604]:  PCIe Bridge [13ac:1178] (rev 01)
	3f:08.0 PCI bridge [0604]:  PCIe Bridge [13ac:1178] (rev 01)
	40:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:0fbb]

Kata with IOMMU Group 66

$ sudo ctr run --runtime "io.containerd.kata.v2" --rm -t --device /dev/vfio/66 "$image" test-kata uname -r
ctr: QMP command failed: Device 'vfio-bf35ce81cd1f15c20' not found: not found

Followed this guide https://github.com/kata-containers/kata-containers/blob/main/docs/use-cases/Nvidia-GPU-passthrough-and-Kata.md and build the kernel and tried out also various configuration settings in the config file (machine type, etc…) nothing worked.

In Kata 2.2.0 I got the error message that one of the PCI bridges could not be found rather then the vfio-xxxxx device.

Is Kata trying to use all of the devices in an IOMMU Group?

Does Kata make a differentiation between PCI root, PCI bridges, and the actual devices in an IOMMU group?

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 24 (24 by maintainers)

Commits related to this issue

Most upvoted comments

@fidencio Yes, on it. Will report.