TensorRT: BART Error: 'BARTTRTDecoder' object has no attribute 'trt_context_non_kv'

Description

Follow https://github.com/NVIDIA/TensorRT/blob/main/demo/HuggingFace/notebooks/t5.ipynb to build the tensorRT example for Bart model. Get warning in bart_trt_decoder = BARTTRTDecoder(bart_trt_decoder_engine, metadata, tfm_config) Cannot find binding of given name: past_key_values.0.decoder.key and error in outputs = bart_trt_decoder(input_ids, encoder_last_hidden_state) ‘BARTTRTDecoder’ object has no attribute ‘trt_context_non_kv’

Environment

TensorRT Version: ‘8.4.1.5’ NVIDIA GPU: A100-SXM4-40GB NVIDIA Driver Version: 460.73.01 CUDA Version: 11.2 CUDNN Version: 8.0.5 Operating System: Debian GNU/Linux 10 (buster) Python Version (if applicable): 3.7.12 Tensorflow Version (if applicable): PyTorch Version (if applicable): ‘1.11.0’ Baremetal or Container (if so, version):

Relevant Files

‘facebook/bart-base’

Steps To Reproduce

6 outputs = bart_trt_decoder(input_ids, encoder_last_hidden_state)

~/projects/bart/code/TensorRT/demo/HuggingFace/NNDF/tensorrt_utils.py in call(self, *args, **kwargs) 166 def call(self, *args, **kwargs): 167 self.trt_context.active_optimization_profile = self.profile_idx –> 168 return self.forward(*args, **kwargs) 169 170 class PolygraphyOnnxRunner:

~/qinqing/projects/bart/code/TensorRT/demo/HuggingFace/BART/trt.py in forward(self, input_ids, encoder_hidden_states, *args, **kwargs) 401 402 # denote as variable to allow switch between non-kv and kv engines in kv cache mode –> 403 trt_context = self.trt_context_non_kv if non_kv_flag else self.trt_context 404 bindings = self.bindings_non_kv if non_kv_flag else self.bindings 405 inputs = self.inputs_non_kv if non_kv_flag else self.inputs

AttributeError: ‘BARTTRTDecoder’ object has no attribute ‘trt_context_non_kv’ –>

About this issue

Most upvoted comments

Is it a command or convenient way to set up the engine for a local checkpoint of fine-tuned bart model, or a customized bart model?

The easiest way I can think of without making structural changes is to go into frameworks.py: generate_and_download_framework(), and simply replace .from_pretrained(metadata.variant) with your local checkpoint .from_pretrained(checkpoint_file), suppose you fine-tuned on one of the bart-base, bart-large, bart-large-cnn models. If not, you may modify BARTModelConfig.py first by adding your customized config and then do the local checkpoint loading trick.

Yes, the as_trt_engine lines are where the engines got really built. Did you see some log in the notebook like TRT is building the engine (and usually this engine building takes a while), or it just used an existing *.engine file and quickly went through the building part? For most clean check, you can check you saving fpath and delete those *.engine files and re-run the notebook steps again.

You can set use_cache=False for now. The kv cache feature is not fully supported in notebooks yet. We’ll add updated notebooks supporting this feature in one of our next releases.