moderngl: Can't get it working in Kubernetes environment with NVIDIA GPU

I developed a service to serve some rendered graphics with moderngl. However I can’t get it working on my Kubernetes cluster.

using official nvidia gpu operator (not gke) with moderngl 5.10

Current situation: I am trying this official nvidia cudagl image:

nvidia/cudagl:11.3.0-devel-ubuntu20.04

with these packages:

libnvidia-gl-525
libglvnd-dev

also pointed to nvidia directly.

export __EGL_VENDOR_LIBRARY_FILENAMES="/usr/share/glvnd/egl_vendor.d/10_nvidia.json"

I can see GPU with nvidia-smi, also I can use PyTorch or other DL libraries.

creating context like

ctx.create_context(standalone=True, backend="egl")

getting this egl error

requested device index 0, but found 0 devices

About this issue

  • Original URL
  • State: closed
  • Created 5 months ago
  • Comments: 18 (5 by maintainers)

Most upvoted comments

I managed to get it running with ghcr.io/selkies-project/nvidia-egl-desktop:latest with deployment with provided yaml in its repo.

I tried to hack with this small c program:

#include <EGL/egl.h>
#include <EGL/eglext.h>

#include <stdio.h>
#include <stdlib.h>

typedef EGLBoolean (*eglQueryDevicesEXTProc)(EGLint, EGLDeviceEXT *, EGLint *);

int main()
{
  eglQueryDevicesEXTProc eglQueryDevicesEXT = (eglQueryDevicesEXTProc)eglGetProcAddress("eglQueryDevicesEXT");
  if (!eglQueryDevicesEXT)
  {
    printf("Failed to get proc address for eglQueryDevicesEXT\n");
    return -1;
  }

  EGLint num_devices;

  if (eglQueryDevicesEXT(0, NULL, &num_devices) == EGL_FALSE)
  {
    printf("eglQueryDevicesEXT failed\n");
    return -1;
  }

  printf("Device count: %d\n", num_devices);

  return 0;
}

Got 0 device count also, so i think it is a EGL thing rather than glcontext?

@keepdying I think that EGL cannot find devices precisely because the driver versions differ from the package versions.

Current situation: I am trying this official nvidia cudagl image:

nvidia/cudagl:11.3.0-devel-ubuntu20.04

with these packages:

libnvidia-gl-525
libglvnd-dev

Ubuntu 20.04 came out in 2020, libnvidia-gl-525 drivers came out around 2022-2023, and libglvnd-dev is practically up-to-date. Such differences in versions can ruin the connection between the device and programs that simply stop seeing it:

getting this egl error

requested device index 0, but found 0 devices

I’m afraid without updating the error may remain. Other libraries, such as PyTorch, continue to work because they use CUDA, with which it is much easier to maintain backward compatibility, since it is almost independent of the system, much less the servers X.org, Wayland and the like.

Are you really that limited by outdated software versions?