nvidia-container-toolkit: v1.13.5 - cdi generate fails to find matching libraries for driver version

Hi,

With version v1.12.0, this command used to work fine and generate the CDI condig:

# nvidia-ctk cdi generate --nvidia-ctk-path "/snap/${SNAP_NAME}/current/usr/bin/nvidia-ctk"

After switching to v1.13.5, it fails with:

# nvidia-ctk cdi generate --nvidia-ctk-path "/snap/${SNAP_NAME}/current/usr/bin/nvidia-ctk"
INFO[0000] Auto-detected mode as "nvml"                 
INFO[0000] Selecting /dev/nvidia0 as /dev/nvidia0       
INFO[0000] Selecting /dev/dri/card1 as /dev/dri/card1   
WARN[0000] Could not locate /dev/dri/controlD65: pattern /dev/dri/controlD65 not found 
INFO[0000] Selecting /dev/dri/renderD128 as /dev/dri/renderD128 
INFO[0000] Using driver version 515.105.01              
ERRO[0000] failed to generate CDI spec: failed to create edits common for entities: failed to create discoverer for common entities: failed to create discoverer for driver files: failed to create discoverer for driver libraries: failed to get libraries for driver version: failed to locate libcuda.so.515.105.01: pattern libcuda.so.515.105.01 not found 

With both versions, the same LD_LIBRARY_PATH is used, which conatins the correct path to a folder containing ibcuda.so.515.105.01

Full disclaimer, this is running within a snap on Ubuntu Core 22. which is not officially supported I’m sure, but did used to work well with v12

Did anything much change in v13 around libraries discovery ?

Any guidance on trouble shooting or help would be appreciated.

About this issue

  • Original URL
  • State: open
  • Created a year ago
  • Comments: 42 (16 by maintainers)

Most upvoted comments

@jocado there are no cocrete dates yet, but we will most likely have to release by the end of January due to some other features that are required.