frigate: [Support]: Docker (0.12.0-beta2-tensorrt) exception trying to load libnvrtc.so (not found)?

Describe the problem you are having

I’m at a loss and hoping for any suggestions. Basically I’m trying to get a TensorRT detector working with blakeblackshear/frigate:0.12.0-beta2-tensorrt (Docker compose config).

I feel like my general NVIDIA configuration is OK, given:

  • I was able to generate the trt-models using the tensorrt_models.sh script inside a nvcr.io/nvidia/tensorrt:22.07-py3 container
  • nvidia-smi works in the Frigate container, on the host, and in my other NVIDIA runtime containers.
  • ffmpeg hardware acceleration is working fine with the Frigate container using preset-nvidia-h264 and -c:v h264_cuvid
  • I’m running other containers which use CUDA, etc.

However, when trying to startup a TensorRT detector, I get the following:

Could not load library libcudnn_cnn_infer.so.8. Error: libnvrtc.so: cannot open shared object file: No such file or directory
Fatal Python error: Aborted

I see libnvrtc.so on both my host and inside the nvcr.io/nvidia/tensorrt:22.07-py3 and other containers, but not inside my Frigate container. So I’m perplexed as as to how I can make libnvrtc.so (from CUDA?) available in the container short of bind mounting /usr/local/cuda-11.7/targets/x86_64-linux/lib/ from the host. (having tried a variety of compose options)

Version

blakeblackshear/frigate:0.12.0-beta2-tensorrt

Frigate config file

# I'm using this simplified config to test, which runs fine when moved to CPU detector

mqtt:
  host: mqtt.mydomain.com
  port: 8883
  client_id: frigate
  topic_prefix: frigate
  user: myuser
  password: mypass
  tls_ca_certs: /etc/ssl/certs/ca-certificates.crt
  tls_insecure: false

cameras:
  Front-Door:
    ffmpeg:
      hwaccel_args: preset-nvidia-h264
      input_args:
        - -c:v
        - h264_cuvid
      inputs:
        - path: rtsp://myuser:mypass@10.10.70.1:10554/Streaming/Channels/202
          roles:
            - detect
            - restream
        - path: rtsp://myuser:mypass@10.10.70.1:10554/Streaming/Channels/201
          roles:
            - record
    snapshots:
      enabled: true
    motion:
      mask:
        - 142,28,241,33,241,0,142,0
    detect:
      width: 640
      height: 360

detectors:
  tensorrt:
    type: tensorrt

model:
  path: /trt-models/yolov7-tiny-416.trt
  input_tensor: nchw
  input_pixel_format: rgb
  width: 416
  height: 416

Relevant log output

s6-rc: info: service s6rc-oneshot-runner: starting
s6-rc: info: service s6rc-oneshot-runner successfully started
s6-rc: info: service fix-attrs: starting
s6-rc: info: service fix-attrs successfully started
s6-rc: info: service legacy-cont-init: starting
cont-init: info: running /etc/cont-init.d/prepare-logs.sh
cont-init: info: /etc/cont-init.d/prepare-logs.sh exited 0
s6-rc: info: service legacy-cont-init successfully started
s6-rc: info: service legacy-services: starting
services-up: info: copying legacy longrun frigate (no readiness notification)
services-up: info: copying legacy longrun go2rtc (no readiness notification)
services-up: info: copying legacy longrun nginx (no readiness notification)
s6-rc: info: service legacy-services successfully started
2023-01-11 00:46:53.496196078  07:46:53.496 INF go2rtc version 0.1-rc.6 linux/amd64
2023-01-11 00:46:53.496959381  07:46:53.496 INF [api] listen addr=:1984
2023-01-11 00:46:53.497028236  07:46:53.497 INF [rtsp] listen addr=:8554
2023-01-11 00:46:53.497228724  07:46:53.497 INF [webrtc] listen addr=:8555
2023-01-11 00:46:53.497280472  07:46:53.497 INF [srtp] listen addr=:8443
2023-01-11 00:46:54.639356794  [2023-01-11 00:46:54] frigate.app                    INFO    : Starting Frigate (0.12.0-0dbf909)
2023-01-11 00:46:54.661348602  [2023-01-11 00:46:54] peewee_migrate                 INFO    : Starting migrations
2023-01-11 00:46:54.666553629  [2023-01-11 00:46:54] peewee_migrate                 INFO    : There is nothing to migrate
2023-01-11 00:46:54.674083840  [2023-01-11 00:46:54] ws4py                          INFO    : Using epoll
2023-01-11 00:46:54.690982397  [2023-01-11 00:46:54] detector.tensorrt              INFO    : Starting detection process: 970
2023-01-11 00:46:54.691723240  [2023-01-11 00:46:54] frigate.app                    INFO    : Output process started: 972
2023-01-11 00:46:54.694029800  [2023-01-11 00:46:54] ws4py                          INFO    : Using epoll
2023-01-11 00:46:54.695904656  [2023-01-11 00:46:54] frigate.app                    INFO    : Camera processor started for Front-Door: 976
2023-01-11 00:46:54.699253070  [2023-01-11 00:46:54] frigate.app                    INFO    : Capture process started for Front-Door: 978
2023-01-11 00:46:55.148182652  [2023-01-11 00:46:55] frigate.detectors.plugins.tensorrt INFO    : [MemUsageChange] Init CUDA: CPU +188, GPU +0, now: CPU 241, GPU 127 (MiB)
2023-01-11 00:46:55.166258368  [2023-01-11 00:46:55] frigate.detectors.plugins.tensorrt INFO    : Loaded engine size: 35 MiB
2023-01-11 00:46:55.512402191  [2023-01-11 00:46:55] frigate.detectors.plugins.tensorrt INFO    : [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +192, GPU +74, now: CPU 496, GPU 241 (MiB)
2023-01-11 00:46:55.690972712  [2023-01-11 00:46:55] frigate.detectors.plugins.tensorrt INFO    : [MemUsageChange] Init cuDNN: CPU +110, GPU +44, now: CPU 606, GPU 285 (MiB)
2023-01-11 00:46:55.705521956  Could not load library libcudnn_cnn_infer.so.8. Error: libnvrtc.so: cannot open shared object file: No such file or directory
2023-01-11 00:46:55.705531168  Fatal Python error: Aborted
2023-01-11 00:46:55.705543019
2023-01-11 00:46:55.705547155  Thread 0x00007f6348f9a6c0 (most recent call first):
2023-01-11 00:46:55.705553100    File "/usr/lib/python3.9/threading.py", line 312 in wait
2023-01-11 00:46:55.705558934    File "/usr/lib/python3.9/multiprocessing/queues.py", line 233 in _feed
2023-01-11 00:46:55.705603275    File "/usr/lib/python3.9/threading.py", line 892 in run
2023-01-11 00:46:55.705639906    File "/usr/lib/python3.9/threading.py", line 954 in _bootstrap_inner
2023-01-11 00:46:55.705644013    File "/usr/lib/python3.9/threading.py", line 912 in _bootstrap
2023-01-11 00:46:55.705647504
2023-01-11 00:46:55.705651546  Current thread 0x00007f634d256740 (most recent call first):
2023-01-11 00:46:55.705655880    File "/opt/frigate/frigate/detectors/plugins/tensorrt.py", line 229 in __init__
2023-01-11 00:46:55.705660139    File "/opt/frigate/frigate/detectors/__init__.py", line 24 in create_detector
2023-01-11 00:46:55.705664586    File "/opt/frigate/frigate/object_detection.py", line 52 in __init__
2023-01-11 00:46:55.705668786    File "/opt/frigate/frigate/object_detection.py", line 97 in run_detector
2023-01-11 00:46:55.705686380    File "/usr/lib/python3.9/multiprocessing/process.py", line 108 in run
2023-01-11 00:46:55.705690779    File "/usr/lib/python3.9/multiprocessing/process.py", line 315 in _bootstrap
2023-01-11 00:46:55.705695155    File "/usr/lib/python3.9/multiprocessing/popen_fork.py", line 71 in _launch
2023-01-11 00:46:55.705709406    File "/usr/lib/python3.9/multiprocessing/popen_fork.py", line 19 in __init__
2023-01-11 00:46:55.705730545    File "/usr/lib/python3.9/multiprocessing/context.py", line 277 in _Popen
2023-01-11 00:46:55.705754864    File "/usr/lib/python3.9/multiprocessing/context.py", line 224 in _Popen
2023-01-11 00:46:55.705792265    File "/usr/lib/python3.9/multiprocessing/process.py", line 121 in start
2023-01-11 00:46:55.705818600    File "/opt/frigate/frigate/object_detection.py", line 172 in start_or_restart
2023-01-11 00:46:55.705843911    File "/opt/frigate/frigate/object_detection.py", line 144 in __init__
2023-01-11 00:46:55.705868075    File "/opt/frigate/frigate/app.py", line 214 in start_detectors
2023-01-11 00:46:55.705889471    File "/opt/frigate/frigate/app.py", line 364 in start
2023-01-11 00:46:55.705908039    File "/opt/frigate/frigate/__main__.py", line 16 in <module>
2023-01-11 00:46:55.705937887    File "/usr/lib/python3.9/runpy.py", line 87 in _run_code
2023-01-11 00:46:55.705984158    File "/usr/lib/python3.9/runpy.py", line 197 in _run_module_as_main
2023-01-11 00:47:15.027433642  [2023-01-11 00:47:15] frigate.watchdog               INFO    : Detection appears to have stopped. Exiting frigate...
s6-rc: info: service legacy-services: stopping
2023-01-11 00:47:15.034035211  exit OK
2023-01-11 00:47:15.034394785  [2023-01-11 00:47:15] frigate.app                    INFO    : Stopping...
2023-01-11 00:47:15.035051550  [2023-01-11 00:47:15] ws4py                          INFO    : Closing all websockets with [1001] 'Server is shutting down'
2023-01-11 00:47:15.035056307  [2023-01-11 00:47:15] frigate.storage                INFO    : Exiting storage maintainer...
2023-01-11 00:47:15.037505849  [2023-01-11 00:47:15] frigate.events                 INFO    : Exiting event cleanup...
2023-01-11 00:47:15.038340104  [2023-01-11 00:47:15] frigate.record                 INFO    : Exiting recording cleanup...
2023-01-11 00:47:15.038345550  [2023-01-11 00:47:15] frigate.stats                  INFO    : Exiting watchdog...
2023-01-11 00:47:15.038360928  [2023-01-11 00:47:15] frigate.record                 INFO    : Exiting recording maintenance...
2023-01-11 00:47:15.038635641  [2023-01-11 00:47:15] frigate.watchdog               INFO    : Exiting watchdog...
2023-01-11 00:47:15.038826899  [2023-01-11 00:47:15] frigate.events                 INFO    : Exiting event processor...
s6-svwait: fatal: supervisor died
s6-rc: info: service legacy-services successfully stopped
s6-rc: info: service legacy-cont-init: stopping
s6-rc: info: service legacy-cont-init successfully stopped
s6-rc: info: service fix-attrs: stopping
s6-rc: info: service fix-attrs successfully stopped
s6-rc: info: service s6rc-oneshot-runner: stopping
s6-rc: info: service s6rc-oneshot-runner successfully stopped

FFprobe output from your camera

N/A

Frigate stats

N/A

Operating system

Debian

Install method

Docker Compose

Coral version

Other

Network connection

Wired

Camera make and model

N/A

Any other information that may be helpful

nvidia-smi inside the container (ffmpeg process doesn’t show, but does on host nvidia-smi and nvtop):

root@frigate:/opt/frigate# nvidia-smi
Wed Jan 11 00:53:49 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.65.01    Driver Version: 515.65.01    CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Quadro P2000        Off  | 00000000:51:00.0 Off |                  N/A |
| 52%   45C    P0    16W /  75W |     74MiB /  5120MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

Looking for libs in Frigate container:

root@frigate:/opt/frigate# ldconfig -p |grep libcudnn_cnn_infer
<null>

root@frigate:/opt/frigate# ldconfig -p |grep libnvrtc
<null>

root@frigate:/opt/frigate# find / -name libcudnn_cnn_infer* -print
/usr/local/lib/python3.9/dist-packages/nvidia/cudnn/lib/libcudnn_cnn_infer.so.8

root@frigate:/opt/frigate# find / -name libnvrtc* -print
<null>

Looking for libs inside nvcr.io/nvidia/tensorrt:22.07-py3 used to generate /trt-models:

root@docker:/ # docker run -it --rm nvcr.io/nvidia/tensorrt:22.07-py3 sh -c 'ldconfig -p |grep libcudnn_cnn_infer'

	libcudnn_cnn_infer.so.8 (libc6,x86-64) => /lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8
	libcudnn_cnn_infer.so (libc6,x86-64) => /lib/x86_64-linux-gnu/libcudnn_cnn_infer.so

root@docker:/ # docker run -it --rm nvcr.io/nvidia/tensorrt:22.07-py3 sh -c 'ldconfig -p |grep libnvrtc'

	libnvrtc.so.11.2 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnvrtc.so.11.2
	libnvrtc.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnvrtc.so
	libnvrtc-builtins.so.11.7 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnvrtc-builtins.so.11.7
	libnvrtc-builtins.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnvrtc-builtins.so

Docker compose file (several other variations tried with same result):

version: "3.7"
services:
  frigate:
    container_name: frigate
    hostname: frigate
    image: blakeblackshear/frigate:0.12.0-beta2-tensorrt
    privileged: true
    runtime: nvidia
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]
    shm_size: "256mb"    
    volumes:
      - /etc/localtime:/etc/localtime:ro
      - /storage/docker/frigate/config.yml:/config/config.yml:ro
      - /storage/docker/frigate/storage:/media/frigate
      - /storage/docker/frigate/trt-models:/trt-models
      - type: tmpfs
        target: /tmp/cache
        tmpfs:
          size: 1000000000
    ports:
      - "127.0.0.1:9049:5000"
    environment:
      FRIGATE_RTSP_PASSWORD: "somepassword"
      NVIDIA_VISIBLE_DEVICES: all
      NVIDIA_DRIVER_CAPABILITIES: compute,utility,video
    restart: unless-stopped

Thanks in advance for ANY ideas! 👍

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 25 (14 by maintainers)

Most upvoted comments

Awesome! Thanks so much for helping troubleshoot this.

Aha! I’ve recreated this issue by regenerating the models.

Running the yolov4-tiny-416 model instead of yolov7 does not complain.

My GPU is a GTX 1050 Driver Version: 525.60.13 CUDA Version: 12.0 I am using the yolov7-tiny-416 model.

I am seeing the issue with beta3. I think @Codelica is right on.

FWIW, I bind mounted /usr/local/cuda-11.7/targets/x86_64-linux/lib/libnvrtc.so.11.7.99 from the host side to /usr/local/lib/python3.9/dist-packages/nvidia/cudnn/lib/libnvrtc.so in the container and everything came to life with detections working, etc. Just not sure if that if that should be magically getting passed in via some more official mechanism. 😃