server: python backend with custom packages reports error "Internal: Failed to initialize stub, stub process exited unexpectedly"

Description Error when stub and conda-pack is given to the server Internal: Failed to initialize stub, stub process exited unexpectedly

Triton Information triton version 21.06.1 with container

To Reproduce I have followed this to create model

Commands to reproduce

# install packages to use in python_backend
$ conda create -n gpt2 python=3.8
$ conda activate gpt2
$ conda install numpy
$ pip install transformers tokenizers torch conda-pack

# build stub
$ git clone https://github.com/triton-inference-server/python_backend -b r21.06
$ cd python_backend
$ mkdir build && cd build
$ cmake -DTRITON_ENABLE_GPU=ON -DCMAKE_INSTALL_PREFIX:PATH=`pwd`/install ..
$ make triton-python-backend-stub

# can confirm libpython is linked in sub and copy it to model repository
$ ldd triton_python_backend_stub | grep python
        libpython3.8.so.1.0 => /home/ubuntu/miniconda3/envs/gpt2/lib/libpython3.8.so.1.0 (0x00007f97bdbb0000)
$ cp triton_python_backend_stub /path/to/model_repository/gpt2

# copy conda pack to model repository
$ conda-pack
$ cp gpt2.tar.gz /path/to/model_repository/gpt2

Directory structure of model repository

model_repository
└── gpt2
    ├── 1
    │   └── model.py
    └── config.pbtxt
    └── triton_python_backend_stub
    └── gpt2.tar.gz

model_registry/gpt2/1/model.py

import numpy as np

import triton_python_backend_utils as pb_utils

from transformers import GPT2LMHeadModel, GPT2Tokenizer


class TritonPythonModel:

    def initialize(self, args):

        self.model = GPT2LMHeadModel.from_pretrained(
            'gpt2',
            max_length = 128,
            repetition_penalty = 2.0
        )
        self.tokenizer = GPT2Tokenizer.from_pretrained("gpt2")

    def execute(self, requests):

        responses = []

        for request in requests:
            in_0 = pb_utils.get_input_tensor_by_name(request, "INPUT0")
            in_0 = in_0.as_numpy()[0].decode("utf-8")
            input_ids = self.tokenizer(in_0, return_tensors='pt').input_ids

            outputs = self.model.generate(
                input_ids,
                pad_token_id=self.tokenizer.pad_token_id,
                eos_token_id=self.tokenizer.eos_token_id,
                bos_token_id=self.tokenizer.bos_token_id,
                use_cache=True)
            out_0 = self.tokenizer.decode(outputs[0], skip_special_tokens=True)

            out_0 = pb_utils.Tensor("OUTPUT0",
                                    np.array([out_0], dtype=object))

            inference_response = pb_utils.InferenceResponse(
                output_tensors=[out_tensor_0])
            responses.append(inference_response)
        return responses

    def finalize(self):

        print('Cleaning up...')

model_registry/gpt2/config.pbtxt

backend: "python"

max_batch_size: 64


input [
  {
    name: "INPUT0"
    data_type: TYPE_STRING
    dims: [ -1 ]
  }
]

output [
  {
    name: "OUTPUT0"
    data_type: TYPE_STRING
    dims: [ -1 ]
  }
]

dynamic_batching {
  preferred_batch_size: [ 1,2,4,8,16,32,64 ]
  max_queue_delay_microseconds: 30000
}

instance_group [{ count: 1, kind: KIND_GPU }]

parameters: {
  key: "EXECUTION_ENV_PATH",
  value: {string_value: "/models/gpt2/gpt2.tar.gz"}
}

Then run docker containe

$ docker run --gpus=1 --rm -p8000:8000 -p8001:8001 -p8002:8002 -v /path/to/model_repository:/models nvcr.io/nvidia/tritonserver:21.06.1-py3 tritonserver --model-repository=/models --log-verbose 10


=============================
== Triton Inference Server ==
=============================

NVIDIA Release 21.06 (build 24449615)

Copyright (c) 2018-2021, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION.  All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

NOTE: Legacy NVIDIA Driver detected.  Compatibility mode ENABLED.

I0708 02:45:43.242384 1 metrics.cc:291] Collecting metrics for GPU 0: Tesla V100-PCIE-16GB
I0708 02:45:43.242771 1 shared_library.cc:108] OpenLibraryHandle: /opt/tritonserver/backends/pytorch/libtriton_pytorch.so
I0708 02:45:43.578167 1 libtorch.cc:987] TRITONBACKEND_Initialize: pytorch
I0708 02:45:43.578216 1 libtorch.cc:997] Triton TRITONBACKEND API version: 1.4
I0708 02:45:43.578222 1 libtorch.cc:1003] 'pytorch' TRITONBACKEND API version: 1.4
I0708 02:45:43.578278 1 shared_library.cc:108] OpenLibraryHandle: /opt/tritonserver/backends/tensorflow1/libtriton_tensorflow1.so
2021-07-08 02:45:43.771931: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
I0708 02:45:43.815189 1 tensorflow.cc:2165] TRITONBACKEND_Initialize: tensorflow
I0708 02:45:43.815223 1 tensorflow.cc:2175] Triton TRITONBACKEND API version: 1.4
I0708 02:45:43.815229 1 tensorflow.cc:2181] 'tensorflow' TRITONBACKEND API version: 1.4
I0708 02:45:43.815234 1 tensorflow.cc:2205] backend configuration:
{}
I0708 02:45:43.815296 1 shared_library.cc:108] OpenLibraryHandle: /opt/tritonserver/backends/onnxruntime/libtriton_onnxruntime.so
I0708 02:45:43.816660 1 onnxruntime.cc:1969] TRITONBACKEND_Initialize: onnxruntime
I0708 02:45:43.816687 1 onnxruntime.cc:1979] Triton TRITONBACKEND API version: 1.4
I0708 02:45:43.816692 1 onnxruntime.cc:1985] 'onnxruntime' TRITONBACKEND API version: 1.4
I0708 02:45:43.825961 1 shared_library.cc:108] OpenLibraryHandle: /opt/tritonserver/backends/openvino/libtriton_openvino.so
I0708 02:45:43.834638 1 openvino.cc:1188] TRITONBACKEND_Initialize: openvino
I0708 02:45:43.834660 1 openvino.cc:1198] Triton TRITONBACKEND API version: 1.4
I0708 02:45:43.834666 1 openvino.cc:1204] 'openvino' TRITONBACKEND API version: 1.4
I0708 02:45:44.262809 1 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7f8694000000' with size 268435456
I0708 02:45:44.263337 1 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864
I0708 02:45:44.264080 1 backend_factory.h:45] Create TritonBackendFactory
I0708 02:45:44.264106 1 plan_backend_factory.cc:49] Create PlanBackendFactory
I0708 02:45:44.264111 1 plan_backend_factory.cc:56] Registering TensorRT Plugins
I0708 02:45:44.264149 1 logging.cc:52] Registered plugin creator - ::BatchTilePlugin_TRT version 1
I0708 02:45:44.264168 1 logging.cc:52] Registered plugin creator - ::BatchedNMS_TRT version 1
I0708 02:45:44.264186 1 logging.cc:52] Registered plugin creator - ::BatchedNMSDynamic_TRT version 1
I0708 02:45:44.264194 1 logging.cc:52] Registered plugin creator - ::CoordConvAC version 1
I0708 02:45:44.264202 1 logging.cc:52] Registered plugin creator - ::CropAndResize version 1
I0708 02:45:44.264212 1 logging.cc:52] Registered plugin creator - ::CropAndResizeDynamic version 1
I0708 02:45:44.264220 1 logging.cc:52] Registered plugin creator - ::DetectionLayer_TRT version 1
I0708 02:45:44.264237 1 logging.cc:52] Registered plugin creator - ::FlattenConcat_TRT version 1
I0708 02:45:44.264244 1 logging.cc:52] Registered plugin creator - ::GenerateDetection_TRT version 1
I0708 02:45:44.264254 1 logging.cc:52] Registered plugin creator - ::GridAnchor_TRT version 1
I0708 02:45:44.264273 1 logging.cc:52] Registered plugin creator - ::GridAnchorRect_TRT version 1
I0708 02:45:44.264282 1 logging.cc:52] Registered plugin creator - ::InstanceNormalization_TRT version 1
I0708 02:45:44.264290 1 logging.cc:52] Registered plugin creator - ::LReLU_TRT version 1
I0708 02:45:44.264303 1 logging.cc:52] Registered plugin creator - ::MultilevelCropAndResize_TRT version 1
I0708 02:45:44.264319 1 logging.cc:52] Registered plugin creator - ::MultilevelProposeROI_TRT version 1
I0708 02:45:44.264333 1 logging.cc:52] Registered plugin creator - ::NMS_TRT version 1
I0708 02:45:44.264341 1 logging.cc:52] Registered plugin creator - ::NMSDynamic_TRT version 1
I0708 02:45:44.264348 1 logging.cc:52] Registered plugin creator - ::Normalize_TRT version 1
I0708 02:45:44.264357 1 logging.cc:52] Registered plugin creator - ::PriorBox_TRT version 1
I0708 02:45:44.264369 1 logging.cc:52] Registered plugin creator - ::ProposalLayer_TRT version 1
I0708 02:45:44.264378 1 logging.cc:52] Registered plugin creator - ::Proposal version 1
I0708 02:45:44.264388 1 logging.cc:52] Registered plugin creator - ::ProposalDynamic version 1
I0708 02:45:44.264396 1 logging.cc:52] Registered plugin creator - ::PyramidROIAlign_TRT version 1
I0708 02:45:44.264407 1 logging.cc:52] Registered plugin creator - ::Region_TRT version 1
I0708 02:45:44.264415 1 logging.cc:52] Registered plugin creator - ::Reorg_TRT version 1
I0708 02:45:44.264422 1 logging.cc:52] Registered plugin creator - ::ResizeNearest_TRT version 1
I0708 02:45:44.264434 1 logging.cc:52] Registered plugin creator - ::RPROI_TRT version 1
I0708 02:45:44.264441 1 logging.cc:52] Registered plugin creator - ::SpecialSlice_TRT version 1
I0708 02:45:44.264447 1 logging.cc:52] Registered plugin creator - ::Split version 1
I0708 02:45:44.264460 1 ensemble_backend_factory.cc:47] Create EnsembleBackendFactory
I0708 02:45:44.265837 1 model_repository_manager.cc:749] AsyncLoad() 'gpt2'
I0708 02:45:44.265903 1 model_repository_manager.cc:988] TriggerNextAction() 'gpt2' version 1: 1
I0708 02:45:44.265917 1 model_repository_manager.cc:1026] Load() 'gpt2' version 1
I0708 02:45:44.265921 1 model_repository_manager.cc:1045] loading: gpt2:1
I0708 02:45:44.366670 1 model_repository_manager.cc:1105] CreateInferenceBackend() 'gpt2' version 1
I0708 02:45:44.366793 1 shared_library.cc:108] OpenLibraryHandle: /opt/tritonserver/backends/python/libtriton_python.so
I0708 02:45:44.369495 1 python.cc:1298] 'python' TRITONBACKEND API version: 1.4
I0708 02:45:44.369516 1 python.cc:1320] backend configuration:
{}
I0708 02:45:44.369527 1 python.cc:1397] shm-default-byte-size=67108864,shm-growth-byte-size=67108864,stub-timeout-seconds=30
I0708 02:45:44.369966 1 python.cc:1445] TRITONBACKEND_ModelInitialize: gpt2 (version 1)
I0708 02:45:44.370963 1 model_config_utils.cc:1521] ModelConfig 64-bit fields:
I0708 02:45:44.370981 1 model_config_utils.cc:1523]     ModelConfig::dynamic_batching::default_queue_policy::default_timeout_microseconds
I0708 02:45:44.370985 1 model_config_utils.cc:1523]     ModelConfig::dynamic_batching::max_queue_delay_microseconds
I0708 02:45:44.370989 1 model_config_utils.cc:1523]     ModelConfig::dynamic_batching::priority_queue_policy::value::default_timeout_microseconds
I0708 02:45:44.370993 1 model_config_utils.cc:1523]     ModelConfig::ensemble_scheduling::step::model_version
I0708 02:45:44.370997 1 model_config_utils.cc:1523]     ModelConfig::input::dims
I0708 02:45:44.371001 1 model_config_utils.cc:1523]     ModelConfig::input::reshape::shape
I0708 02:45:44.371005 1 model_config_utils.cc:1523]     ModelConfig::instance_group::secondary_devices::device_id
I0708 02:45:44.371009 1 model_config_utils.cc:1523]     ModelConfig::model_warmup::inputs::value::dims
I0708 02:45:44.371013 1 model_config_utils.cc:1523]     ModelConfig::optimization::cuda::graph_spec::graph_lower_bound::input::value::dim
I0708 02:45:44.371017 1 model_config_utils.cc:1523]     ModelConfig::optimization::cuda::graph_spec::input::value::dim
I0708 02:45:44.371021 1 model_config_utils.cc:1523]     ModelConfig::output::dims
I0708 02:45:44.371026 1 model_config_utils.cc:1523]     ModelConfig::output::reshape::shape
I0708 02:45:44.371030 1 model_config_utils.cc:1523]     ModelConfig::sequence_batching::direct::max_queue_delay_microseconds
I0708 02:45:44.371034 1 model_config_utils.cc:1523]     ModelConfig::sequence_batching::max_sequence_idle_microseconds
I0708 02:45:44.371038 1 model_config_utils.cc:1523]     ModelConfig::sequence_batching::oldest::max_queue_delay_microseconds
I0708 02:45:44.371042 1 model_config_utils.cc:1523]     ModelConfig::version_policy::specific::versions
I0708 02:45:44.371145 1 python.cc:1267] Using Python execution env /models/gpt2/gpt2.tar.gz
I0708 02:45:44.372722 1 python.cc:1489] TRITONBACKEND_ModelInstanceInitialize: gpt2_0 (GPU device 0)
I0708 02:45:44.374147 1 backend_model_instance.cc:105] Creating instance gpt2_0 on GPU 0 (7.0) using artifact ''
I0708 02:46:02.783664 56 python.cc:918] Starting Python backend stub: export LD_LIBRARY_PATH=/tmp/python_env_nED7ai/0/lib:$LD_LIBRARY_PATH; source /tmp/python_env_nED7ai/0/bin/activate && exec /models/gpt2/triton_python_backend_stub /models/gpt2/1/model.py /gpt2_0_GPU_0 67108864 67108864 1 /opt/tritonserver/backends/python
I0708 02:46:03.845931 1 python.cc:1549] TRITONBACKEND_ModelInstanceFinalize: delete instance state
I0708 02:46:03.851969 1 python.cc:1468] TRITONBACKEND_ModelFinalize: delete model state
I0708 02:46:03.852005 1 triton_backend_manager.cc:101] unloading backend 'python'
I0708 02:46:03.852011 1 python.cc:1425] TRITONBACKEND_Finalize: Start
I0708 02:46:04.577935 1 python.cc:1430] TRITONBACKEND_Finalize: End
E0708 02:46:04.579122 1 model_repository_manager.cc:1215] failed to load 'gpt2' version 1: Internal: Failed to initialize stub, stub process exited unexpectedly: gpt2_0
I0708 02:46:04.579144 1 model_repository_manager.cc:988] TriggerNextAction() 'gpt2' version 1: 0
I0708 02:46:04.579153 1 model_repository_manager.cc:1003] no next action, trigger OnComplete()
I0708 02:46:04.579229 1 model_repository_manager.cc:594] VersionStates() 'gpt2'
I0708 02:46:04.579281 1 model_repository_manager.cc:594] VersionStates() 'gpt2'
I0708 02:46:04.579335 1 server.cc:504]
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I0708 02:46:04.579402 1 server.cc:543]
+-------------+-----------------------------------------------------------------+--------+
| Backend     | Path                                                            | Config |
+-------------+-----------------------------------------------------------------+--------+
| tensorrt    | <built-in>                                                      | {}     |
| pytorch     | /opt/tritonserver/backends/pytorch/libtriton_pytorch.so         | {}     |
| tensorflow  | /opt/tritonserver/backends/tensorflow1/libtriton_tensorflow1.so | {}     |
| onnxruntime | /opt/tritonserver/backends/onnxruntime/libtriton_onnxruntime.so | {}     |
| openvino    | /opt/tritonserver/backends/openvino/libtriton_openvino.so       | {}     |
+-------------+-----------------------------------------------------------------+--------+

I0708 02:46:04.579414 1 model_repository_manager.cc:570] BackendStates()
I0708 02:46:04.579444 1 server.cc:586]
+-------+---------+--------------------------------------------------------------------------------------------+
| Model | Version | Status                                                                                     |
+-------+---------+--------------------------------------------------------------------------------------------+
| gpt2  | 1       | UNAVAILABLE: Internal: Failed to initialize stub, stub process exited unexpectedly: gpt2_0 |
+-------+---------+--------------------------------------------------------------------------------------------+

I0708 02:46:04.579550 1 tritonserver.cc:1718]
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option                           | Value                                                                                                                                                                                  |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id                        | triton                                                                                                                                                                                 |
| server_version                   | 2.11.0                                                                                                                                                                                 |
| server_extensions                | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data statistics |
| model_repository_path[0]         | /models                                                                                                                                                                                |
| model_control_mode               | MODE_NONE                                                                                                                                                                              |
| strict_model_config              | 1                                                                                                                                                                                      |
| pinned_memory_pool_byte_size     | 268435456                                                                                                                                                                              |
| cuda_memory_pool_byte_size{0}    | 67108864                                                                                                                                                                               |
| min_supported_compute_capability | 6.0                                                                                                                                                                                    |
| strict_readiness                 | 1                                                                                                                                                                                      |
| exit_timeout                     | 30                                                                                                                                                                                     |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0708 02:46:04.579569 1 server.cc:234] Waiting for in-flight requests to complete.
I0708 02:46:04.579574 1 model_repository_manager.cc:694] AsyncUnload() 'gpt2'
I0708 02:46:04.579579 1 model_repository_manager.cc:988] TriggerNextAction() 'gpt2' version 1: 2
I0708 02:46:04.579584 1 model_repository_manager.cc:1071] Unload() 'gpt2' version 1
I0708 02:46:04.579591 1 model_repository_manager.cc:534] LiveBackendStates()
I0708 02:46:04.579595 1 server.cc:249] Timeout 30: Found 0 live models and 0 in-flight non-inference requests
I0708 02:46:04.579602 1 triton_backend_manager.cc:101] unloading backend 'pytorch'
I0708 02:46:04.579613 1 triton_backend_manager.cc:101] unloading backend 'tensorflow'
I0708 02:46:04.579631 1 triton_backend_manager.cc:101] unloading backend 'onnxruntime'
I0708 02:46:04.579660 1 triton_backend_manager.cc:101] unloading backend 'openvino'
error: creating server: Internal - failed to load all models

About this issue

Original URL
State: closed
Created 3 years ago
Comments: 15 (8 by maintainers)

Most upvoted comments

Thanks for the detailed info. I have filed a bug against myself to investigate why this doesn’t happen in the cases that you have shared.

Tabrizian on Jul 13, 2021