moderngl: Can't get it working in Kubernetes environment with NVIDIA GPU
I developed a service to serve some rendered graphics with moderngl. However I can’t get it working on my Kubernetes cluster.
using official nvidia gpu operator (not gke) with moderngl 5.10
Current situation: I am trying this official nvidia cudagl image:
nvidia/cudagl:11.3.0-devel-ubuntu20.04
with these packages:
libnvidia-gl-525
libglvnd-dev
also pointed to nvidia directly.
export __EGL_VENDOR_LIBRARY_FILENAMES="/usr/share/glvnd/egl_vendor.d/10_nvidia.json"
I can see GPU with nvidia-smi, also I can use PyTorch or other DL libraries.
creating context like
ctx.create_context(standalone=True, backend="egl")
getting this egl error
requested device index 0, but found 0 devices
About this issue
- Original URL
- State: closed
- Created 5 months ago
- Comments: 18 (5 by maintainers)
I managed to get it running with
ghcr.io/selkies-project/nvidia-egl-desktop:latest
with deployment with provided yaml in its repo.I tried to hack with this small c program:
Got 0 device count also, so i think it is a EGL thing rather than glcontext?
@keepdying I think that EGL cannot find devices precisely because the driver versions differ from the package versions.
Ubuntu 20.04 came out in 2020, libnvidia-gl-525 drivers came out around 2022-2023, and libglvnd-dev is practically up-to-date. Such differences in versions can ruin the connection between the device and programs that simply stop seeing it:
I’m afraid without updating the error may remain. Other libraries, such as PyTorch, continue to work because they use CUDA, with which it is much easier to maintain backward compatibility, since it is almost independent of the system, much less the servers X.org, Wayland and the like.
Are you really that limited by outdated software versions?
https://github.com/szabolcsdombi/zengl/commit/e65191b86f5f39acd9966217c8bc75abc97afe6d
Please see the removed code uses headless.cpp