server: python backend error: c_python_backend_utils.TritonModelException: Tensor is stored in GPU and cannot be converted to NumPy
Description I am currently using the Python Backend BLS function and called another tensorrt model using the pb_utils.inferencerequest interface and the call succeeded, but the result is stored on the GPU,and I can’t find how to copy the interface from the GPU.
Triton Information 22.01
Are you using the Triton container or did you build it yourself? no
Expected behavior Can python backend copy InferenceRequest results directly to the CPU?
Here is my debugging information:
(Pdb)
> /fas_repo/bls_model/1/fas_pipe.py(60)face_detect()
-> inputs=[images],
(Pdb)
> /fas_repo/bls_model/1/fas_pipe.py(61)face_detect()
-> requested_output_names=self.outputs_0)
(Pdb)
> /fas_repo/bls_model/1/fas_pipe.py(58)face_detect()
-> infer_request = pb_utils.InferenceRequest(
(Pdb)
> /fas_repo/bls_model/1/fas_pipe.py(62)face_detect()
-> infer_response = infer_request.exec()
(Pdb)
> /fas_repo/bls_model/1/fas_pipe.py(65)face_detect()
-> confs = pb_utils.get_output_tensor_by_name(infer_response, 'class')
(Pdb)
> /fas_repo/bls_model/1/fas_pipe.py(66)face_detect()
-> locs = pb_utils.get_output_tensor_by_name(infer_response, 'bbox')
(Pdb) p confs
<c_python_backend_utils.Tensor object at 0x7f08f1716130>
(Pdb) dir(confs)
['__class__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'as_numpy', 'from_dlpack', 'is_cpu', 'name', 'to_dlpack', 'triton_dtype']
(Pdb) p confs.is_cpu()
False
(Pdb) p confs.as_numpy()
*** c_python_backend_utils.TritonModelException: Tensor is stored in GPU and cannot be converted to NumPy.
(Pdb)
This is the code that I sent the request:
......
import pdb
pdb.set_trace()
images = pb_utils.Tensor("images", preprocessed_imgs)
infer_request = pb_utils.InferenceRequest(
model_name=self.model_name0,
inputs=[images],
requested_output_names=self.outputs_0)
infer_response = infer_request.exec()
#if infer_response.has_error():
# return False
confs = pb_utils.get_output_tensor_by_name(infer_response, 'class')
locs = pb_utils.get_output_tensor_by_name(infer_response, 'bbox')
......
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 15 (5 by maintainers)
Any update on this? I am currently blocked by this
Looks like the DLPack protocol has changed a bit since we designed this interface in Python backend and Numpy is using a newer version. I’ll file a ticket for improving the DLPack support in Python backend.
Hi any update now? Blocked by this issue too.
numpyversion: 1.23.3tritonserverversion: 22.09we can use
Sorry for the delay. We are working on this ticket and hopefully it should be available soon.
There are many circumstances that lead to the BLS output being on GPU. For example, the backend can decide which output to use for GPU tensors. Python backend tries to provide the outputs in the same device it has received the output to avoid data movement.