server: CPU-only mode unable to load Models got CUDA error
Problem Description I was trying to follow the official example starting the server on a cpu-only device by calling the command:
docker run --rm -p8000:8000 -p8001:8001 -p8002:8002 -v/Users/tamannaverma/triton-inference-server/docs/examples/model_repository:/models nvcr.io/nvidia/tritonserver:22.01-py3 tritonserver --model-repository=/models
Here is the logs:
> =============================
> == Triton Inference Server ==
> =============================
>
> NVIDIA Release 22.01 (build 31237564)
>
> Copyright (c) 2018-2021, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
>
> Various files include modifications (c) NVIDIA CORPORATION. All rights reserved.
>
> This container image and its contents are governed by the NVIDIA Deep Learning Container License.
> By pulling and using the container, you accept the terms and conditions of this license:
> https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
> find: File system loop detected; '/usr/local/cuda-11.6/compat/lib.real/lib.real' is part of the same file system loop as '/usr/local/cuda-11.6/compat/lib.real'.
> find: File system loop detected; '/usr/local/cuda-11.6/compat/lib/lib.real' is part of the same file system loop as '/usr/local/cuda-11.6/compat/lib'.
> find: File system loop detected; '/usr/local/cuda-11/compat/lib.real/lib.real' is part of the same file system loop as '/usr/local/cuda-11/compat/lib.real'.
> find: File system loop detected; '/usr/local/cuda-11/compat/lib/lib.real' is part of the same file system loop as '/usr/local/cuda-11/compat/lib'.
> find: File system loop detected; '/usr/local/cuda/compat/lib.real/lib.real' is part of the same file system loop as '/usr/local/cuda/compat/lib.real'.
> find: File system loop detected; '/usr/local/cuda/compat/lib/lib.real' is part of the same file system loop as '/usr/local/cuda/compat/lib'.
>
> WARNING: The NVIDIA Driver was not detected. GPU functionality will not be available.
> Use Docker with NVIDIA Container Toolkit to start this container; see
> https://github.com/NVIDIA/nvidia-docker.
>
> WARNING: [Torch-TensorRT] - Unable to read CUDA capable devices. Return status: 999
> I0224 09:20:10.194531 1 libtorch.cc:1227] TRITONBACKEND_Initialize: pytorch
> I0224 09:20:10.194635 1 libtorch.cc:1237] Triton TRITONBACKEND API version: 1.7
> I0224 09:20:10.194639 1 libtorch.cc:1243] 'pytorch' TRITONBACKEND API version: 1.7
> 2022-02-24 09:20:10.482327: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
> 2022-02-24 09:20:10.533967: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
> I0224 09:20:10.534722 1 tensorflow.cc:2176] TRITONBACKEND_Initialize: tensorflow
> I0224 09:20:10.534746 1 tensorflow.cc:2186] Triton TRITONBACKEND API version: 1.7
> I0224 09:20:10.534749 1 tensorflow.cc:2192] 'tensorflow' TRITONBACKEND API version: 1.7
> I0224 09:20:10.534752 1 tensorflow.cc:2216] backend configuration:
> {}
> I0224 09:20:10.546856 1 onnxruntime.cc:2232] TRITONBACKEND_Initialize: onnxruntime
> I0224 09:20:10.546921 1 onnxruntime.cc:2242] Triton TRITONBACKEND API version: 1.7
> I0224 09:20:10.546924 1 onnxruntime.cc:2248] 'onnxruntime' TRITONBACKEND API version: 1.7
> I0224 09:20:10.546927 1 onnxruntime.cc:2278] backend configuration:
> {}
> W0224 09:20:10.563170 1 pinned_memory_manager.cc:236] Unable to allocate pinned system memory, pinned memory pool will not be available: unknown error
> E0224 09:20:10.563244 1 server.cc:198] Failed to initialize CUDA memory manager: unable to get number of CUDA devices: unknown error
> W0224 09:20:10.563249 1 server.cc:205] failed to enable peer access for some device pairs
> E0224 09:20:10.584340 1 model_repository_manager.cc:1844] Poll failed for model directory 'densenet_onnx': unable to get number of CUDA devices: unknown error
> E0224 09:20:10.596656 1 model_repository_manager.cc:1844] Poll failed for model directory 'inception_graphdef': unable to get number of CUDA devices: unknown error
> E0224 09:20:10.607955 1 model_repository_manager.cc:1844] Poll failed for model directory 'simple': unable to get number of CUDA devices: unknown error
> E0224 09:20:10.619405 1 model_repository_manager.cc:1844] Poll failed for model directory 'simple_dyna_sequence': unable to get number of CUDA devices: unknown error
> E0224 09:20:10.632553 1 model_repository_manager.cc:1844] Poll failed for model directory 'simple_identity': unable to get number of CUDA devices: unknown error
> E0224 09:20:10.640729 1 model_repository_manager.cc:1844] Poll failed for model directory 'simple_int8': unable to get number of CUDA devices: unknown error
> E0224 09:20:10.649843 1 model_repository_manager.cc:1844] Poll failed for model directory 'simple_sequence': unable to get number of CUDA devices: unknown error
> E0224 09:20:10.661630 1 model_repository_manager.cc:1844] Poll failed for model directory 'simple_string': unable to get number of CUDA devices: unknown error
> I0224 09:20:10.661776 1 server.cc:519]
> +------------------+------+
> | Repository Agent | Path |
> +------------------+------+
> +------------------+------+
>
> I0224 09:20:10.661800 1 server.cc:546]
> +-------------+-----------------------------------------------------------------+--------+
> | Backend | Path | Config |
> +-------------+-----------------------------------------------------------------+--------+
> | pytorch | /opt/tritonserver/backends/pytorch/libtriton_pytorch.so | {} |
> | tensorflow | /opt/tritonserver/backends/tensorflow1/libtriton_tensorflow1.so | {} |
> | onnxruntime | /opt/tritonserver/backends/onnxruntime/libtriton_onnxruntime.so | {} |
> +-------------+-----------------------------------------------------------------+--------+
>
> I0224 09:20:10.661807 1 server.cc:589]
> +-------+---------+--------+
> | Model | Version | Status |
> +-------+---------+--------+
> +-------+---------+--------+
>
> I0224 09:20:10.661952 1 tritonserver.cc:1865]
> +----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
> | Option | Value |
> +----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
> | server_id | triton |
> | server_version | 2.18.0 |
> | server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data statistics |
> | model_repository_path[0] | /models |
> | model_control_mode | MODE_NONE |
> | strict_model_config | 1 |
> | rate_limit | OFF |
> | pinned_memory_pool_byte_size | 268435456 |
> | response_cache_byte_size | 0 |
> | min_supported_compute_capability | 6.0 |
> | strict_readiness | 1 |
> | exit_timeout | 30 |
> +----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
>
> I0224 09:20:10.662202 1 server.cc:249] Waiting for in-flight requests to complete.
> I0224 09:20:10.662208 1 server.cc:264] Timeout 30: Found 0 live models and 0 in-flight non-inference requests
> error: creating server: Internal - failed to load all models
Triton Information Version: 22.01 I am using Mac M1 pro for the local setup.
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 17 (5 by maintainers)
I can’t access the repository… I get a 404 😃
Yes sorry for the delay. Here you have the release I’ve built https://hub.docker.com/repository/docker/prometeiads/tritonserver I’ll update it with the latest version
I have the same issue with m1, how to deal with it?
Hi @jbkyang-nvi I’ll share all the repository so that you can take a look. Anyway I’ve solved by building a docker image using the compose.py on my mac with m1 and specifying the in the docker build command
--platform=linux/amd64Same issue. Try the command below if you just want to run the demo , which works for me
you can input multiple
--load-modelto load more specific models