TensorRT-LLM: pip install -e . does not work

System Info

x86, H100, Ubuntu

Who can help?

No response

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, …)
My own task or dataset (give details below)

Reproduction

On a system running CUDA 12.3 and H100, I installed the dependencies by running scripts referred to by Dockerfile.multi: https://github.com/NVIDIA/TensorRT-LLM/blob/0ab9d17a59c284d2de36889832fe9fc7c8697604/docker/Dockerfile.multi#L8-L51 by setting ENV to ~/.bashrc.

This allowed me to run the following command to build TensorRT-LLM from source code:

pip install -e . --extra-index-url https://pypi.nvidia.com

The building process is very fast, which does not look right, because it usually takes 40 minutes for build_wheel.py to build everything.

After the building, pip list shows that tensorrt-llm is installed.

$ pip list | grep tensorrt
tensorrt                 9.2.0.post12.dev5
tensorrt-llm             0.9.0.dev2024020600 /root/TensorRT-LLM

However, importing it would error:

$ pythonon3 -c 'import tensorrt_llm'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/root/TensorRT-LLM/tensorrt_llm/__init__.py", line 44, in <module>
    from .hlapi.llm import LLM, ModelConfig
  File "/root/TensorRT-LLM/tensorrt_llm/hlapi/__init__.py", line 1, in <module>
    from .llm import LLM, ModelConfig
  File "/root/TensorRT-LLM/tensorrt_llm/hlapi/llm.py", line 17, in <module>
    from ..executor import (GenerationExecutor, GenerationResult,
  File "/root/TensorRT-LLM/tensorrt_llm/executor.py", line 11, in <module>
    import tensorrt_llm.bindings as tllm
ModuleNotFoundError: No module named 'tensorrt_llm.bindings'

Expected behavior

My project requires me to build the main branch of TensorRT-LLM. It would be great if pip install could work, so I could declare TensorRT-LLM as a dependency in my project’s pyproject.toml file.

actual behavior

I had to build TensorRT-LLM by invoking build_wheel.py as in https://github.com/NVIDIA/TensorRT-LLM/blob/main/docs/source/build_from_source.md#build-tensorrt-llm

additional notes

I was able to build vLLM with the CUDA kernels using pip -e .. Not sure if we could take their build setup as a reference.

About this issue

Original URL
State: closed
Created 5 months ago
Reactions: 2
Comments: 15 (4 by maintainers)

Most upvoted comments

I am working on fixing this issue now. Thanks for your support!

Shixiaowei02 on Feb 20, 2024

@wangkuiyi , for your information, @Shixiaowei02 is based in China. It means that he won’t be able to work on this issue before the end of the break for the Chinese New Year.

jdemouth-nvidia on Feb 11, 2024

I looks like pip install -e . does not automatically trigger the buiding of the Python binding of the C++ runtime.

wangkuiyi on Feb 9, 2024