TensorRT-LLM: build llama fail
# pip list | grep torch pytorch-quantization 2.1.2 torch 1.12.1+cu113 torch-tensorrt 2.0.0.dev0 torchdata 0.7.0a0 torchtext 0.16.0a0 torchvision 0.16.0a0
# nvidia-smi ±----------------------------------------------------------------------------+ | NVIDIA-SMI 525.105.17 Driver Version: 525.105.17 CUDA Version: 12.2 | |-------------------------------±---------------------±---------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA A10 On | 00000000:00:07.0 Off | 0 | | 0% 24C P8 15W / 150W | 0MiB / 23028MiB | 0% Default | | | | N/A | ±------------------------------±---------------------±---------------------+
±----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | ±----------------------------------------------------------------------------+
# make -C docker release_build CUDA_ARCHS=“89-real;90-real”
# make -C docker release_run
# python3 ./scripts/build_wheel.py --trt_root /usr/local/tensorrt
# pip install ./build/tensorrt_llm.whl*
# python build.py --model_dir /code/model/llama/llama-2-7b-hf
–dtype float16
–remove_input_padding
–use_gpt_attention_plugin float16
–enable_context_fmha
–use_gemm_plugin float16
–output_dir /code/model/llama_tensor
–world_size 8
–tp_size 8
build llama fail
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/transformers/utils/import_utils.py", line 1099, in _get_module
return importlib.import_module("." + module_name, self.__name__)
File "/usr/lib/python3.10/importlib/__init__.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 883, in exec_module
File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 32, in <module>
from ...modeling_utils import PreTrainedModel
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 86, in <module>
from accelerate import dispatch_model, infer_auto_device_map, init_empty_weights
File "/usr/local/lib/python3.10/dist-packages/accelerate/__init__.py", line 3, in <module>
from .accelerator import Accelerator
File "/usr/local/lib/python3.10/dist-packages/accelerate/accelerator.py", line 34, in <module>
from .checkpointing import load_accelerator_state, load_custom_state, save_accelerator_state, save_custom_state
File "/usr/local/lib/python3.10/dist-packages/accelerate/checkpointing.py", line 24, in <module>
from .utils import (
File "/usr/local/lib/python3.10/dist-packages/accelerate/utils/__init__.py", line 112, in <module>
from .launch import (
File "/usr/local/lib/python3.10/dist-packages/accelerate/utils/launch.py", line 27, in <module>
from ..utils.other import merge_dicts
File "/usr/local/lib/python3.10/dist-packages/accelerate/utils/other.py", line 24, in <module>
from .transformer_engine import convert_model
File "/usr/local/lib/python3.10/dist-packages/accelerate/utils/transformer_engine.py", line 21, in <module>
import transformer_engine.pytorch as te
File "/usr/local/lib/python3.10/dist-packages/transformer_engine/pytorch/__init__.py", line 6, in <module>
from .module import LayerNormLinear
File "/usr/local/lib/python3.10/dist-packages/transformer_engine/pytorch/module/__init__.py", line 6, in <module>
from .layernorm_linear import LayerNormLinear
File "/usr/local/lib/python3.10/dist-packages/transformer_engine/pytorch/module/layernorm_linear.py", line 15, in <module>
from .. import cpp_extensions as tex
File "/usr/local/lib/python3.10/dist-packages/transformer_engine/pytorch/cpp_extensions/__init__.py", line 6, in <module>
from transformer_engine_extensions import *
ImportError: /usr/local/lib/python3.10/dist-packages/transformer_engine_extensions.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN2at4_ops5zeros4callEN3c108ArrayRefINS2_6SymIntEEENS2_8optionalINS2_10ScalarTypeEEENS6_INS2_6LayoutEEENS6_INS2_6DeviceEEENS6_IbEE
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/code/tensorrt_llm/examples/llama/build.py", line 24, in <module>
from transformers import LlamaConfig, LlamaForCausalLM
File "<frozen importlib._bootstrap>", line 1075, in _handle_fromlist
File "/usr/local/lib/python3.10/dist-packages/transformers/utils/import_utils.py", line 1090, in __getattr__
value = getattr(module, name)
File "/usr/local/lib/python3.10/dist-packages/transformers/utils/import_utils.py", line 1089, in __getattr__
module = self._get_module(self._class_to_module[name])
File "/usr/local/lib/python3.10/dist-packages/transformers/utils/import_utils.py", line 1101, in _get_module
raise RuntimeError(
RuntimeError: Failed to import transformers.models.llama.modeling_llama because of the following error (look up to see its traceback):
/usr/local/lib/python3.10/dist-packages/transformer_engine_extensions.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN2at4_ops5zeros4callEN3c108ArrayRefINS2_6SymIntEEENS2_8optionalINS2_10ScalarTypeEEENS6_INS2_6LayoutEEENS6_INS2_6DeviceEEENS6_IbEE
About this issue
- Original URL
- State: closed
- Created 7 months ago
- Comments: 18
@ixp9891 I recommend increasing your swap. I was able to build Llama-7b with limited RAM of 16 GB when I increased my swap (on G5.xlarge).
In this case, you can monitor your RAM usage through tools like nmon or ntop during the build phase.
Additionally, I believe you need to build TensorRT-LLM with both SM80 and SM86 together.