server: Allow tritonserver to stay up on model load failure?
Hi, I’m running triton using tritonserver --model-repository=/models --model-control-mode=none
Is there a way, in this mode, to allow models to fail to load during startup?
My usecase is I have tensorrt models targeting different NVIDIA GPU families, where only one of the models is expected to load properly.
Workaround
My workaround right now is to enable explicit model control, and manually enable all models (allowing some models to fail to load). In this way, triton does not terminate itself on startup.
Logs
Here are truncated startup logs
I0110 21:30:54.920681 60 server.cc:592]
+---------------+---------+---------------------------------------------------------+
| Model | Version | Status |
+---------------+---------+---------------------------------------------------------+
| <redacted> | 1 | READY |
| <redacted> | 1 | UNAVAILABLE: Internal: unable to create TensorRT engine |
+---------------+---------+---------------------------------------------------------+
I0110 21:30:54.920786 60 tritonserver.cc:1920]
+----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option | Value |
+----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id | triton |
| server_version | 2.16.0 |
| server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memo |
| | ry binary_tensor_data statistics |
| model_repository_path[0] | /models |
| model_control_mode | MODE_NONE |
| strict_model_config | 1 |
| rate_limit | OFF |
| pinned_memory_pool_byte_size | 268435456 |
| cuda_memory_pool_byte_size{0} | 67108864 |
| response_cache_byte_size | 0 |
| min_supported_compute_capability | 6.0 |
| strict_readiness | 1 |
| exit_timeout | 30 |
+----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
I0110 21:30:54.920814 60 server.cc:252] Waiting for in-flight requests to complete.
I0110 21:30:54.920823 60 model_repository_manager.cc:1055] unloading: yolox-m-p1000:1
I0110 21:30:54.920868 60 server.cc:267] Timeout 30: Found 1 live models and 0 in-flight non-inference requests
I0110 21:30:54.920947 60 tensorrt.cc:5272] TRITONBACKEND_ModelInstanceFinalize: delete instance state
I0110 21:30:54.937010 60 logging.cc:49] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 607, GPU 2303 (MiB)
I0110 21:30:54.960855 60 tensorrt.cc:5211] TRITONBACKEND_ModelFinalize: delete model state
I0110 21:30:54.961569 60 model_repository_manager.cc:1166] successfully unloaded 'yolox-m-p1000' version 1
W0110 21:30:55.771926 60 metrics.cc:406] Unable to get power limit for GPU 0. Status:Success, value:0.000000
W0110 21:30:55.771991 60 metrics.cc:424] Unable to get power usage for GPU 0. Status:Success, value:0.000000
W0110 21:30:55.772005 60 metrics.cc:448] Unable to get energy consumption for GPU 0. Status:Success, value:0
I0110 21:30:55.920967 60 server.cc:267] Timeout 29: Found 0 live models and 0 in-flight non-inference requests
error: creating server: Internal - failed to load all models
Instead of the above behavior, I would like triton to continue serving whatever models have successfully loaded.
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 15 (1 by maintainers)
Hi!
How can the “0 in-flight non-inference requests” be fixed? On my server all models are loaded successfully but then I get the following:
I0511 04:23:45.550649 1 server.cc:264] Waiting for in-flight requests to complete. I0511 04:23:45.550655 1 server.cc:280] Timeout 30: Found 0 model versions that have in-flight inferences I0511 04:23:45.551024 1 server.cc:295] All models are stopped, unloading modelsWhich then leads to
error: creating server: Internal - failed to load all modelsI am a bit confused because the server worked fine before and then changed to this behavior without any obvious changes made …
Means leave only files that are needed by the model, not additional files, no additional directories, no additional directory levels.
If I have such model path in SeldonDeployment:
Then in
ts-seldon-volumeroot directory I should have atritondirectory with onlymultidirectory inside. Inmultidirectory I should have only dirs with models and versions nested along with configuration in pbtxt and model binary itself.multi.zip
I am attaching zipped models from Seldon example docs and below working YAML for SeldonDeployment using PVC and modelUri.