frigate: [Support]: Docker (0.12.0-beta2-tensorrt) exception trying to load libnvrtc.so (not found)?
Describe the problem you are having
I’m at a loss and hoping for any suggestions. Basically I’m trying to get a TensorRT detector working with blakeblackshear/frigate:0.12.0-beta2-tensorrt
(Docker compose config).
I feel like my general NVIDIA configuration is OK, given:
- I was able to generate the
trt-models
using thetensorrt_models.sh
script inside anvcr.io/nvidia/tensorrt:22.07-py3
container nvidia-smi
works in the Frigate container, on the host, and in my other NVIDIA runtime containers.- ffmpeg hardware acceleration is working fine with the Frigate container using
preset-nvidia-h264
and-c:v h264_cuvid
- I’m running other containers which use CUDA, etc.
However, when trying to startup a TensorRT detector, I get the following:
Could not load library libcudnn_cnn_infer.so.8. Error: libnvrtc.so: cannot open shared object file: No such file or directory
Fatal Python error: Aborted
I see libnvrtc.so
on both my host and inside the nvcr.io/nvidia/tensorrt:22.07-py3
and other containers, but not inside my Frigate container. So I’m perplexed as as to how I can make libnvrtc.so
(from CUDA?) available in the container short of bind mounting /usr/local/cuda-11.7/targets/x86_64-linux/lib/
from the host. (having tried a variety of compose options)
Version
blakeblackshear/frigate:0.12.0-beta2-tensorrt
Frigate config file
# I'm using this simplified config to test, which runs fine when moved to CPU detector
mqtt:
host: mqtt.mydomain.com
port: 8883
client_id: frigate
topic_prefix: frigate
user: myuser
password: mypass
tls_ca_certs: /etc/ssl/certs/ca-certificates.crt
tls_insecure: false
cameras:
Front-Door:
ffmpeg:
hwaccel_args: preset-nvidia-h264
input_args:
- -c:v
- h264_cuvid
inputs:
- path: rtsp://myuser:mypass@10.10.70.1:10554/Streaming/Channels/202
roles:
- detect
- restream
- path: rtsp://myuser:mypass@10.10.70.1:10554/Streaming/Channels/201
roles:
- record
snapshots:
enabled: true
motion:
mask:
- 142,28,241,33,241,0,142,0
detect:
width: 640
height: 360
detectors:
tensorrt:
type: tensorrt
model:
path: /trt-models/yolov7-tiny-416.trt
input_tensor: nchw
input_pixel_format: rgb
width: 416
height: 416
Relevant log output
s6-rc: info: service s6rc-oneshot-runner: starting
s6-rc: info: service s6rc-oneshot-runner successfully started
s6-rc: info: service fix-attrs: starting
s6-rc: info: service fix-attrs successfully started
s6-rc: info: service legacy-cont-init: starting
cont-init: info: running /etc/cont-init.d/prepare-logs.sh
cont-init: info: /etc/cont-init.d/prepare-logs.sh exited 0
s6-rc: info: service legacy-cont-init successfully started
s6-rc: info: service legacy-services: starting
services-up: info: copying legacy longrun frigate (no readiness notification)
services-up: info: copying legacy longrun go2rtc (no readiness notification)
services-up: info: copying legacy longrun nginx (no readiness notification)
s6-rc: info: service legacy-services successfully started
2023-01-11 00:46:53.496196078 07:46:53.496 INF go2rtc version 0.1-rc.6 linux/amd64
2023-01-11 00:46:53.496959381 07:46:53.496 INF [api] listen addr=:1984
2023-01-11 00:46:53.497028236 07:46:53.497 INF [rtsp] listen addr=:8554
2023-01-11 00:46:53.497228724 07:46:53.497 INF [webrtc] listen addr=:8555
2023-01-11 00:46:53.497280472 07:46:53.497 INF [srtp] listen addr=:8443
2023-01-11 00:46:54.639356794 [2023-01-11 00:46:54] frigate.app INFO : Starting Frigate (0.12.0-0dbf909)
2023-01-11 00:46:54.661348602 [2023-01-11 00:46:54] peewee_migrate INFO : Starting migrations
2023-01-11 00:46:54.666553629 [2023-01-11 00:46:54] peewee_migrate INFO : There is nothing to migrate
2023-01-11 00:46:54.674083840 [2023-01-11 00:46:54] ws4py INFO : Using epoll
2023-01-11 00:46:54.690982397 [2023-01-11 00:46:54] detector.tensorrt INFO : Starting detection process: 970
2023-01-11 00:46:54.691723240 [2023-01-11 00:46:54] frigate.app INFO : Output process started: 972
2023-01-11 00:46:54.694029800 [2023-01-11 00:46:54] ws4py INFO : Using epoll
2023-01-11 00:46:54.695904656 [2023-01-11 00:46:54] frigate.app INFO : Camera processor started for Front-Door: 976
2023-01-11 00:46:54.699253070 [2023-01-11 00:46:54] frigate.app INFO : Capture process started for Front-Door: 978
2023-01-11 00:46:55.148182652 [2023-01-11 00:46:55] frigate.detectors.plugins.tensorrt INFO : [MemUsageChange] Init CUDA: CPU +188, GPU +0, now: CPU 241, GPU 127 (MiB)
2023-01-11 00:46:55.166258368 [2023-01-11 00:46:55] frigate.detectors.plugins.tensorrt INFO : Loaded engine size: 35 MiB
2023-01-11 00:46:55.512402191 [2023-01-11 00:46:55] frigate.detectors.plugins.tensorrt INFO : [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +192, GPU +74, now: CPU 496, GPU 241 (MiB)
2023-01-11 00:46:55.690972712 [2023-01-11 00:46:55] frigate.detectors.plugins.tensorrt INFO : [MemUsageChange] Init cuDNN: CPU +110, GPU +44, now: CPU 606, GPU 285 (MiB)
2023-01-11 00:46:55.705521956 Could not load library libcudnn_cnn_infer.so.8. Error: libnvrtc.so: cannot open shared object file: No such file or directory
2023-01-11 00:46:55.705531168 Fatal Python error: Aborted
2023-01-11 00:46:55.705543019
2023-01-11 00:46:55.705547155 Thread 0x00007f6348f9a6c0 (most recent call first):
2023-01-11 00:46:55.705553100 File "/usr/lib/python3.9/threading.py", line 312 in wait
2023-01-11 00:46:55.705558934 File "/usr/lib/python3.9/multiprocessing/queues.py", line 233 in _feed
2023-01-11 00:46:55.705603275 File "/usr/lib/python3.9/threading.py", line 892 in run
2023-01-11 00:46:55.705639906 File "/usr/lib/python3.9/threading.py", line 954 in _bootstrap_inner
2023-01-11 00:46:55.705644013 File "/usr/lib/python3.9/threading.py", line 912 in _bootstrap
2023-01-11 00:46:55.705647504
2023-01-11 00:46:55.705651546 Current thread 0x00007f634d256740 (most recent call first):
2023-01-11 00:46:55.705655880 File "/opt/frigate/frigate/detectors/plugins/tensorrt.py", line 229 in __init__
2023-01-11 00:46:55.705660139 File "/opt/frigate/frigate/detectors/__init__.py", line 24 in create_detector
2023-01-11 00:46:55.705664586 File "/opt/frigate/frigate/object_detection.py", line 52 in __init__
2023-01-11 00:46:55.705668786 File "/opt/frigate/frigate/object_detection.py", line 97 in run_detector
2023-01-11 00:46:55.705686380 File "/usr/lib/python3.9/multiprocessing/process.py", line 108 in run
2023-01-11 00:46:55.705690779 File "/usr/lib/python3.9/multiprocessing/process.py", line 315 in _bootstrap
2023-01-11 00:46:55.705695155 File "/usr/lib/python3.9/multiprocessing/popen_fork.py", line 71 in _launch
2023-01-11 00:46:55.705709406 File "/usr/lib/python3.9/multiprocessing/popen_fork.py", line 19 in __init__
2023-01-11 00:46:55.705730545 File "/usr/lib/python3.9/multiprocessing/context.py", line 277 in _Popen
2023-01-11 00:46:55.705754864 File "/usr/lib/python3.9/multiprocessing/context.py", line 224 in _Popen
2023-01-11 00:46:55.705792265 File "/usr/lib/python3.9/multiprocessing/process.py", line 121 in start
2023-01-11 00:46:55.705818600 File "/opt/frigate/frigate/object_detection.py", line 172 in start_or_restart
2023-01-11 00:46:55.705843911 File "/opt/frigate/frigate/object_detection.py", line 144 in __init__
2023-01-11 00:46:55.705868075 File "/opt/frigate/frigate/app.py", line 214 in start_detectors
2023-01-11 00:46:55.705889471 File "/opt/frigate/frigate/app.py", line 364 in start
2023-01-11 00:46:55.705908039 File "/opt/frigate/frigate/__main__.py", line 16 in <module>
2023-01-11 00:46:55.705937887 File "/usr/lib/python3.9/runpy.py", line 87 in _run_code
2023-01-11 00:46:55.705984158 File "/usr/lib/python3.9/runpy.py", line 197 in _run_module_as_main
2023-01-11 00:47:15.027433642 [2023-01-11 00:47:15] frigate.watchdog INFO : Detection appears to have stopped. Exiting frigate...
s6-rc: info: service legacy-services: stopping
2023-01-11 00:47:15.034035211 exit OK
2023-01-11 00:47:15.034394785 [2023-01-11 00:47:15] frigate.app INFO : Stopping...
2023-01-11 00:47:15.035051550 [2023-01-11 00:47:15] ws4py INFO : Closing all websockets with [1001] 'Server is shutting down'
2023-01-11 00:47:15.035056307 [2023-01-11 00:47:15] frigate.storage INFO : Exiting storage maintainer...
2023-01-11 00:47:15.037505849 [2023-01-11 00:47:15] frigate.events INFO : Exiting event cleanup...
2023-01-11 00:47:15.038340104 [2023-01-11 00:47:15] frigate.record INFO : Exiting recording cleanup...
2023-01-11 00:47:15.038345550 [2023-01-11 00:47:15] frigate.stats INFO : Exiting watchdog...
2023-01-11 00:47:15.038360928 [2023-01-11 00:47:15] frigate.record INFO : Exiting recording maintenance...
2023-01-11 00:47:15.038635641 [2023-01-11 00:47:15] frigate.watchdog INFO : Exiting watchdog...
2023-01-11 00:47:15.038826899 [2023-01-11 00:47:15] frigate.events INFO : Exiting event processor...
s6-svwait: fatal: supervisor died
s6-rc: info: service legacy-services successfully stopped
s6-rc: info: service legacy-cont-init: stopping
s6-rc: info: service legacy-cont-init successfully stopped
s6-rc: info: service fix-attrs: stopping
s6-rc: info: service fix-attrs successfully stopped
s6-rc: info: service s6rc-oneshot-runner: stopping
s6-rc: info: service s6rc-oneshot-runner successfully stopped
FFprobe output from your camera
N/A
Frigate stats
N/A
Operating system
Debian
Install method
Docker Compose
Coral version
Other
Network connection
Wired
Camera make and model
N/A
Any other information that may be helpful
nvidia-smi
inside the container (ffmpeg process doesn’t show, but does on host nvidia-smi and nvtop):
root@frigate:/opt/frigate# nvidia-smi
Wed Jan 11 00:53:49 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.65.01 Driver Version: 515.65.01 CUDA Version: 11.7 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Quadro P2000 Off | 00000000:51:00.0 Off | N/A |
| 52% 45C P0 16W / 75W | 74MiB / 5120MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
Looking for libs in Frigate container:
root@frigate:/opt/frigate# ldconfig -p |grep libcudnn_cnn_infer
<null>
root@frigate:/opt/frigate# ldconfig -p |grep libnvrtc
<null>
root@frigate:/opt/frigate# find / -name libcudnn_cnn_infer* -print
/usr/local/lib/python3.9/dist-packages/nvidia/cudnn/lib/libcudnn_cnn_infer.so.8
root@frigate:/opt/frigate# find / -name libnvrtc* -print
<null>
Looking for libs inside nvcr.io/nvidia/tensorrt:22.07-py3
used to generate /trt-models
:
root@docker:/ # docker run -it --rm nvcr.io/nvidia/tensorrt:22.07-py3 sh -c 'ldconfig -p |grep libcudnn_cnn_infer'
libcudnn_cnn_infer.so.8 (libc6,x86-64) => /lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8
libcudnn_cnn_infer.so (libc6,x86-64) => /lib/x86_64-linux-gnu/libcudnn_cnn_infer.so
root@docker:/ # docker run -it --rm nvcr.io/nvidia/tensorrt:22.07-py3 sh -c 'ldconfig -p |grep libnvrtc'
libnvrtc.so.11.2 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnvrtc.so.11.2
libnvrtc.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnvrtc.so
libnvrtc-builtins.so.11.7 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnvrtc-builtins.so.11.7
libnvrtc-builtins.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnvrtc-builtins.so
Docker compose file (several other variations tried with same result):
version: "3.7"
services:
frigate:
container_name: frigate
hostname: frigate
image: blakeblackshear/frigate:0.12.0-beta2-tensorrt
privileged: true
runtime: nvidia
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
shm_size: "256mb"
volumes:
- /etc/localtime:/etc/localtime:ro
- /storage/docker/frigate/config.yml:/config/config.yml:ro
- /storage/docker/frigate/storage:/media/frigate
- /storage/docker/frigate/trt-models:/trt-models
- type: tmpfs
target: /tmp/cache
tmpfs:
size: 1000000000
ports:
- "127.0.0.1:9049:5000"
environment:
FRIGATE_RTSP_PASSWORD: "somepassword"
NVIDIA_VISIBLE_DEVICES: all
NVIDIA_DRIVER_CAPABILITIES: compute,utility,video
restart: unless-stopped
Thanks in advance for ANY ideas! 👍
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 25 (14 by maintainers)
Awesome! Thanks so much for helping troubleshoot this.
Aha! I’ve recreated this issue by regenerating the models.
Running the yolov4-tiny-416 model instead of yolov7 does not complain.
My GPU is a GTX 1050 Driver Version: 525.60.13 CUDA Version: 12.0 I am using the yolov7-tiny-416 model.
I am seeing the issue with beta3. I think @Codelica is right on.
FWIW, I bind mounted
/usr/local/cuda-11.7/targets/x86_64-linux/lib/libnvrtc.so.11.7.99
from the host side to/usr/local/lib/python3.9/dist-packages/nvidia/cudnn/lib/libnvrtc.so
in the container and everything came to life with detections working, etc. Just not sure if that if that should be magically getting passed in via some more official mechanism. 😃