TensorRT-LLM: pip install -e . does not work

System Info

x86, H100, Ubuntu

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, …)
  • My own task or dataset (give details below)

Reproduction

On a system running CUDA 12.3 and H100, I installed the dependencies by running scripts referred to by Dockerfile.multi: https://github.com/NVIDIA/TensorRT-LLM/blob/0ab9d17a59c284d2de36889832fe9fc7c8697604/docker/Dockerfile.multi#L8-L51 by setting ENV to ~/.bashrc.

This allowed me to run the following command to build TensorRT-LLM from source code:

pip install -e . --extra-index-url https://pypi.nvidia.com

The building process is very fast, which does not look right, because it usually takes 40 minutes for build_wheel.py to build everything.

After the building, pip list shows that tensorrt-llm is installed.

$ pip list | grep tensorrt
tensorrt                 9.2.0.post12.dev5
tensorrt-llm             0.9.0.dev2024020600 /root/TensorRT-LLM

However, importing it would error:

$ pythonon3 -c 'import tensorrt_llm'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/root/TensorRT-LLM/tensorrt_llm/__init__.py", line 44, in <module>
    from .hlapi.llm import LLM, ModelConfig
  File "/root/TensorRT-LLM/tensorrt_llm/hlapi/__init__.py", line 1, in <module>
    from .llm import LLM, ModelConfig
  File "/root/TensorRT-LLM/tensorrt_llm/hlapi/llm.py", line 17, in <module>
    from ..executor import (GenerationExecutor, GenerationResult,
  File "/root/TensorRT-LLM/tensorrt_llm/executor.py", line 11, in <module>
    import tensorrt_llm.bindings as tllm
ModuleNotFoundError: No module named 'tensorrt_llm.bindings'

Expected behavior

My project requires me to build the main branch of TensorRT-LLM. It would be great if pip install could work, so I could declare TensorRT-LLM as a dependency in my project’s pyproject.toml file.

actual behavior

I had to build TensorRT-LLM by invoking build_wheel.py as in https://github.com/NVIDIA/TensorRT-LLM/blob/main/docs/source/build_from_source.md#build-tensorrt-llm

additional notes

I was able to build vLLM with the CUDA kernels using pip -e .. Not sure if we could take their build setup as a reference.

About this issue

  • Original URL
  • State: closed
  • Created 5 months ago
  • Reactions: 2
  • Comments: 15 (4 by maintainers)

Most upvoted comments

I am working on fixing this issue now. Thanks for your support!

@wangkuiyi , for your information, @Shixiaowei02 is based in China. It means that he won’t be able to work on this issue before the end of the break for the Chinese New Year.

I looks like pip install -e . does not automatically trigger the buiding of the Python binding of the C++ runtime.