server: CPU-only mode unable to load Models got CUDA error

Problem Description I was trying to follow the official example starting the server on a cpu-only device by calling the command:

docker run --rm -p8000:8000 -p8001:8001 -p8002:8002 -v/Users/tamannaverma/triton-inference-server/docs/examples/model_repository:/models nvcr.io/nvidia/tritonserver:22.01-py3 tritonserver --model-repository=/models

Here is the logs:

> =============================
> == Triton Inference Server ==
> =============================
> 
> NVIDIA Release 22.01 (build 31237564)
> 
> Copyright (c) 2018-2021, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.
> 
> Various files include modifications (c) NVIDIA CORPORATION.  All rights reserved.
> 
> This container image and its contents are governed by the NVIDIA Deep Learning Container License.
> By pulling and using the container, you accept the terms and conditions of this license:
> https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
> find: File system loop detected; '/usr/local/cuda-11.6/compat/lib.real/lib.real' is part of the same file system loop as '/usr/local/cuda-11.6/compat/lib.real'.
> find: File system loop detected; '/usr/local/cuda-11.6/compat/lib/lib.real' is part of the same file system loop as '/usr/local/cuda-11.6/compat/lib'.
> find: File system loop detected; '/usr/local/cuda-11/compat/lib.real/lib.real' is part of the same file system loop as '/usr/local/cuda-11/compat/lib.real'.
> find: File system loop detected; '/usr/local/cuda-11/compat/lib/lib.real' is part of the same file system loop as '/usr/local/cuda-11/compat/lib'.
> find: File system loop detected; '/usr/local/cuda/compat/lib.real/lib.real' is part of the same file system loop as '/usr/local/cuda/compat/lib.real'.
> find: File system loop detected; '/usr/local/cuda/compat/lib/lib.real' is part of the same file system loop as '/usr/local/cuda/compat/lib'.
> 
> WARNING: The NVIDIA Driver was not detected.  GPU functionality will not be available.
>    Use Docker with NVIDIA Container Toolkit to start this container; see
>    https://github.com/NVIDIA/nvidia-docker.
> 
> WARNING: [Torch-TensorRT] - Unable to read CUDA capable devices. Return status: 999
> I0224 09:20:10.194531 1 libtorch.cc:1227] TRITONBACKEND_Initialize: pytorch
> I0224 09:20:10.194635 1 libtorch.cc:1237] Triton TRITONBACKEND API version: 1.7
> I0224 09:20:10.194639 1 libtorch.cc:1243] 'pytorch' TRITONBACKEND API version: 1.7
> 2022-02-24 09:20:10.482327: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
> 2022-02-24 09:20:10.533967: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
> I0224 09:20:10.534722 1 tensorflow.cc:2176] TRITONBACKEND_Initialize: tensorflow
> I0224 09:20:10.534746 1 tensorflow.cc:2186] Triton TRITONBACKEND API version: 1.7
> I0224 09:20:10.534749 1 tensorflow.cc:2192] 'tensorflow' TRITONBACKEND API version: 1.7
> I0224 09:20:10.534752 1 tensorflow.cc:2216] backend configuration:
> {}
> I0224 09:20:10.546856 1 onnxruntime.cc:2232] TRITONBACKEND_Initialize: onnxruntime
> I0224 09:20:10.546921 1 onnxruntime.cc:2242] Triton TRITONBACKEND API version: 1.7
> I0224 09:20:10.546924 1 onnxruntime.cc:2248] 'onnxruntime' TRITONBACKEND API version: 1.7
> I0224 09:20:10.546927 1 onnxruntime.cc:2278] backend configuration:
> {}
> W0224 09:20:10.563170 1 pinned_memory_manager.cc:236] Unable to allocate pinned system memory, pinned memory pool will not be available: unknown error
> E0224 09:20:10.563244 1 server.cc:198] Failed to initialize CUDA memory manager: unable to get number of CUDA devices: unknown error
> W0224 09:20:10.563249 1 server.cc:205] failed to enable peer access for some device pairs
> E0224 09:20:10.584340 1 model_repository_manager.cc:1844] Poll failed for model directory 'densenet_onnx': unable to get number of CUDA devices: unknown error
> E0224 09:20:10.596656 1 model_repository_manager.cc:1844] Poll failed for model directory 'inception_graphdef': unable to get number of CUDA devices: unknown error
> E0224 09:20:10.607955 1 model_repository_manager.cc:1844] Poll failed for model directory 'simple': unable to get number of CUDA devices: unknown error
> E0224 09:20:10.619405 1 model_repository_manager.cc:1844] Poll failed for model directory 'simple_dyna_sequence': unable to get number of CUDA devices: unknown error
> E0224 09:20:10.632553 1 model_repository_manager.cc:1844] Poll failed for model directory 'simple_identity': unable to get number of CUDA devices: unknown error
> E0224 09:20:10.640729 1 model_repository_manager.cc:1844] Poll failed for model directory 'simple_int8': unable to get number of CUDA devices: unknown error
> E0224 09:20:10.649843 1 model_repository_manager.cc:1844] Poll failed for model directory 'simple_sequence': unable to get number of CUDA devices: unknown error
> E0224 09:20:10.661630 1 model_repository_manager.cc:1844] Poll failed for model directory 'simple_string': unable to get number of CUDA devices: unknown error
> I0224 09:20:10.661776 1 server.cc:519] 
> +------------------+------+
> | Repository Agent | Path |
> +------------------+------+
> +------------------+------+
> 
> I0224 09:20:10.661800 1 server.cc:546] 
> +-------------+-----------------------------------------------------------------+--------+
> | Backend     | Path                                                            | Config |
> +-------------+-----------------------------------------------------------------+--------+
> | pytorch     | /opt/tritonserver/backends/pytorch/libtriton_pytorch.so         | {}     |
> | tensorflow  | /opt/tritonserver/backends/tensorflow1/libtriton_tensorflow1.so | {}     |
> | onnxruntime | /opt/tritonserver/backends/onnxruntime/libtriton_onnxruntime.so | {}     |
> +-------------+-----------------------------------------------------------------+--------+
> 
> I0224 09:20:10.661807 1 server.cc:589] 
> +-------+---------+--------+
> | Model | Version | Status |
> +-------+---------+--------+
> +-------+---------+--------+
> 
> I0224 09:20:10.661952 1 tritonserver.cc:1865] 
> +----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
> | Option                           | Value                                                                                                                                                                                  |
> +----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
> | server_id                        | triton                                                                                                                                                                                 |
> | server_version                   | 2.18.0                                                                                                                                                                                 |
> | server_extensions                | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data statistics |
> | model_repository_path[0]         | /models                                                                                                                                                                                |
> | model_control_mode               | MODE_NONE                                                                                                                                                                              |
> | strict_model_config              | 1                                                                                                                                                                                      |
> | rate_limit                       | OFF                                                                                                                                                                                    |
> | pinned_memory_pool_byte_size     | 268435456                                                                                                                                                                              |
> | response_cache_byte_size         | 0                                                                                                                                                                                      |
> | min_supported_compute_capability | 6.0                                                                                                                                                                                    |
> | strict_readiness                 | 1                                                                                                                                                                                      |
> | exit_timeout                     | 30                                                                                                                                                                                     |
> +----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
> 
> I0224 09:20:10.662202 1 server.cc:249] Waiting for in-flight requests to complete.
> I0224 09:20:10.662208 1 server.cc:264] Timeout 30: Found 0 live models and 0 in-flight non-inference requests
> error: creating server: Internal - failed to load all models

Triton Information Version: 22.01 I am using Mac M1 pro for the local setup.

About this issue

Original URL
State: closed
Created 2 years ago
Comments: 17 (5 by maintainers)

Most upvoted comments

Yes sorry for the delay. Here you have the release I’ve built https://hub.docker.com/repository/docker/prometeiads/tritonserver I’ll update it with the latest version

I can’t access the repository… I get a 404 😃

ashrafguitoni on Nov 16, 2022

Yes sorry for the delay. Here you have the release I’ve built https://hub.docker.com/repository/docker/prometeiads/tritonserver I’ll update it with the latest version

fabiofumarola on May 6, 2022

I have the same issue with m1, how to deal with it?

hitcxz on Apr 28, 2022

Hi @jbkyang-nvi I’ll share all the repository so that you can take a look. Anyway I’ve solved by building a docker image using the compose.py on my mac with m1 and specifying the in the docker build command --platform=linux/amd64

fabiofumarola on Mar 8, 2022

Same issue. Try the command below if you just want to run the demo , which works for me

docker run --rm -p8000:8000 -p8001:8001 -p8002:8002 -v $(pwd)/docs/examples/model_repository:/models nvcr.io/nvidia/tritonserver:22.02-py3 tritonserver  --model-control-mode=explicit --load-model simple --model-repository=/models

you can input multiple --load-model to load more specific models

xiaoFine on Mar 8, 2022