TensorRT: BART Error: 'BARTTRTDecoder' object has no attribute 'trt_context_non_kv'
Description
Follow https://github.com/NVIDIA/TensorRT/blob/main/demo/HuggingFace/notebooks/t5.ipynb to build the tensorRT example for Bart model.
Get warning in bart_trt_decoder = BARTTRTDecoder(bart_trt_decoder_engine, metadata, tfm_config)
Cannot find binding of given name: past_key_values.0.decoder.key
and error in outputs = bart_trt_decoder(input_ids, encoder_last_hidden_state)
‘BARTTRTDecoder’ object has no attribute ‘trt_context_non_kv’
Environment
TensorRT Version: ‘8.4.1.5’ NVIDIA GPU: A100-SXM4-40GB NVIDIA Driver Version: 460.73.01 CUDA Version: 11.2 CUDNN Version: 8.0.5 Operating System: Debian GNU/Linux 10 (buster) Python Version (if applicable): 3.7.12 Tensorflow Version (if applicable): PyTorch Version (if applicable): ‘1.11.0’ Baremetal or Container (if so, version):
Relevant Files
Steps To Reproduce
6 outputs = bart_trt_decoder(input_ids, encoder_last_hidden_state)~/projects/bart/code/TensorRT/demo/HuggingFace/NNDF/tensorrt_utils.py in call(self, *args, **kwargs) 166 def call(self, *args, **kwargs): 167 self.trt_context.active_optimization_profile = self.profile_idx –> 168 return self.forward(*args, **kwargs) 169 170 class PolygraphyOnnxRunner:
~/qinqing/projects/bart/code/TensorRT/demo/HuggingFace/BART/trt.py in forward(self, input_ids, encoder_hidden_states, *args, **kwargs) 401 402 # denote as variable to allow switch between non-kv and kv engines in kv cache mode –> 403 trt_context = self.trt_context_non_kv if non_kv_flag else self.trt_context 404 bindings = self.bindings_non_kv if non_kv_flag else self.bindings 405 inputs = self.inputs_non_kv if non_kv_flag else self.inputs
AttributeError: ‘BARTTRTDecoder’ object has no attribute ‘trt_context_non_kv’ –>
About this issue
- Original URL
- State: open
- Created 2 years ago
- Comments: 16
The easiest way I can think of without making structural changes is to go into
frameworks.py
:generate_and_download_framework()
, and simply replace.from_pretrained(metadata.variant)
with your local checkpoint.from_pretrained(checkpoint_file)
, suppose you fine-tuned on one of the bart-base, bart-large, bart-large-cnn models. If not, you may modifyBARTModelConfig.py
first by adding your customized config and then do the local checkpoint loading trick.Yes, the
as_trt_engine
lines are where the engines got really built. Did you see some log in the notebook like TRT is building the engine (and usually this engine building takes a while), or it just used an existing*.engine
file and quickly went through the building part? For most clean check, you can check you saving fpath and delete those *.engine files and re-run the notebook steps again.You can set
use_cache=False
for now. The kv cache feature is not fully supported in notebooks yet. We’ll add updated notebooks supporting this feature in one of our next releases.