TensorRT-LLM: Mistral 7b and Mixtral 8x7b experience degraded performance (using official docs)
System Info
- TensorRT-LLM v0.8.0 (pinned to release commit)
- Nvidia A100
- Mistral-7B-Instruct-v0.2
- Using the CPP runner
- Installed with
pip install tensorrt_llm==0.8.0 --extra-index-url https://pypi.nvidia.com
- Cloned this repository at the v0.8.0 https://github.com/NVIDIA/TensorRT-LLM/commit/5955b8afbad2ddcc3156202b16c567e94c52248f commit for contained scripts
Who can help?
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examples
folder (such as GLUE/SQuAD, …) - My own task or dataset (give details below)
Reproduction
- Build and run Mistral 7b with the example instructions.
python convert_checkpoint.py --model_dir Mistral-7B-Instruct-v0.2 \
--output_dir trt_engines/tllm_checkpoint_1gpu_mistral \
--dtype float16
trtllm-build --checkpoint_dir trt_engines/tllm_checkpoint_1gpu_mistral \
--output_dir trt_engines/fp16/1-gpu/ \
--gemm_plugin float16 \
--max_input_len 32256
python3 ../run.py --max_output_len 2000 \
--input_text "<prompt here>" \
--tokenizer_dir Mistral-7B-Instruct-v0.2 \
--engine_dir trt_engines/fp16/1-gpu \
--max_attention_window_size 4096
- Experiment with various prompts conforming to the Mistral prompt template/format such as:
- “[INST] write me a long story [/INST]”
- “[INST] write me a long book [/INST]”
- “[INST] Please write an essay on the thermodynamics of pizza. [/INST]” or if you disturb the formatting a bit as with the spaces here
- “[INST] hi [/INST] Hello ! How can I help you today ? [INST] hi [/INST]”
Expected behavior
The LLM will provide a complete response which ends instead of deteriorating into an infinite loop. This problem does not happen with Transformers as far as I can tell. Every one of the above example prompts ends and does not infinitely loop.
For example with Transformers:
prompt: [INST] Please write an essay on the thermodynamics of pizza. [/INST]
output:
<s> [INST] Please write an essay on the thermodynamics of pizza. [/INST] Title: The Thermodynamics of Pizza: A Delicious Exploration of Energy Transformations
Abstract:
Pizza, a beloved food item enjoyed by millions around the world, offers an intriguing platform to explore the fundamental principles of thermodynamics. This essay delves into the fascinating world of energy transformations in the context of pizza production, cooking, and consumption.
Introduction:
Thermodynamics, the branch of physics that deals with heat and temperature, provides a framework to understand the transformations of energy in various systems. In our daily lives, we encounter numerous examples of energy transformations, some as simple as a cup of hot coffee cooling down to room temperature or as complex as the combustion engine in a car. In this essay, we will explore the thermodynamics of pizza, from its production to consumption.
Production:
The production of pizza involves several energy transformations. The primary raw materials, such as flour, water, yeast, tomatoes, and cheese, undergo various processes to create the final product. The energy required to produce these raw materials comes from the sun, through the process of photosynthesis in plants, or from non-renewable sources like fossil fuels.
During the baking process, the dough is transformed into a golden-brown crust. This transformation occurs due to the application of heat, which causes the water in the dough to evaporate, producing steam. The heat also denatures the proteins in the dough, allowing it to set and form a solid structure. This process is an endothermic reaction, meaning it absorbs energy from its surroundings.
Cooking:
The cooking of pizza is another fascinating example of energy transformations. The pizza is typically cooked in a wood-fired or gas-fired oven, which provides the high temperatures necessary to cook the pizza evenly and quickly. The heat from the oven transfers energy to the pizza, causing the water in the dough and toppings to evaporate, producing steam. This steam helps to cook the pizza from the inside out, while the high temperatures also help to melt the cheese and brown the crust.
Consumption:
The consumption of pizza is the final stage in its life cycle, and it too involves energy transformations. Our bodies use the energy stored in the pizza to fuel various metabolic processes. The carbohydrates in the pizza are broken down into glucose, which is used as a source of energy. The proteins in the pizza are broken down into amino acids, which are used to build and repair body tissues. The fats in the pizza provide a source of energy and help to absorb the fat-soluble vitamins in the pizza.
Conclusion:
The thermodynamics of pizza provide a fascinating glimpse into the world of energy transformations. From the production of the raw materials to the cooking of the pizza and its consumption, each stage involves the absorption, transfer, and transformation of energy. Understanding these principles not only deepens our appreciation for the science behind our favorite food but also highlights the interconnectedness of various systems in the natural world. So the next time you enjoy a slice of pizza, take a moment to savor not only its delicious flavors but also the fascinating energy transformations that brought it to your table.</s>
actual behavior
Generation does not end until the max_output_len
is reached and the farther it goes, the worse it gets. From what I’ve seen it starts repeating itself and then outputting random tokens which decode as random unicode symbols.
For example,
prompt: [INST] Please write an essay on the thermodynamics of pizza. [/INST]
output:
starts off well with
Input [Text 0]: "<s> [INST] Please write an essay on the thermodynamics of pizza. [/INST]"
Output [Text 0 Beam 0]: "Title: The Thermodynamics of Pizza: A Delicious Exploration
Abstract:
This essay delves into the fascinating world of thermodynamics as it applies to the beloved food item: pizza.
Introduction:
Pizza, a staple food in many households, is a delightful combination of dough, sauce, and various toppings. However, beyond its delicious taste and satisfying texture, lies an intriguing thermodynamic story.
First Law of Thermodynamics:
The first law of thermodynamics, also known as the law of conservation of energy, states that energy cannot be created or destroyed, only transferred or transformed from one form to another.
In the context of pizza, the first law of thermodynamics can be applied to the various stages of pizza preparation and consumption.
For instance, when the dough is being kneaded, the energy from the kneading action is being transferred to the dough, transforming the dough from a disorganized, sticky mass to a smooth, elastic, cohesive structure that is capable of holding the various toppings and sauces that will be added later.
Similarly, when the pizza is being cooked in a hot oven, the energy from the heat is being transferred to the pizza, transforming the pizza from a raw, doughy, unappetizing state to a delicious, golden-brown, mouth-watering masterpiece that is capable of satisfying even the most discerning of taste buds.
but after awhile it starts a continuous loop until finally it reaches the end
In conclusion, the thermodynamics of pizza provide an intriguing and fascinating exploration into the various stages of pizza preparation, consumption, and disposal, and the ways in which the laws of thermodynamics apply to these processes.
From the first law of thermodynamics, which governs the conservation of energy and the transfer and transformation of energy from one form to another, we can see how the energy from the pizza, in the form of the heat and nutrients that it contains, is transferred and transformed from the pizza to the body of the person who is consuming the pizza, and how this process is governed by the first law of thermodynamics.
From the second law of thermodynamics, which governs the increase of entropy, or disorder, in a closed system over time, we can see how the pizza, which once was a delicious, satisfying, nourishing meal, has now been transformed into a mere waste product, which is capable of contributing to the total entropy, or disorder, of the universe.
In summary, the thermodynamics of pizza provide an intriguing and fascinating exploration into the various stages of pizza preparation, consumption, and disposal, and the ways in which the laws of thermodynamics apply to these processes.
From the first law of thermodynamics, which governs the conservation of energy and the transfer and transformation of energy from one form to another, we can see how the energy from the pizza, in the form of the heat and nutrients that it contains, is transferred and transformed from the pizza to the body of the person who is consuming the pizza, and how this process is governed by the first law of thermodynamics.
From the second law of thermodynamics, which governs the increase of entropy, or disorder, in a closed system over time, we can see how the pizza, which once was a delicious, satisfying, nourishing meal, has now been transformed into a mere waste product, which is capable of contributing to the total entropy, or disorder, of the universe.
In conclusion, the thermodynamics of pizza provide an intriguing and fascinating exploration into the various stages of pizza preparation, consumption, and disposal, and the ways in which the laws of thermodynamics apply to these processes.
From the first law of thermodynamics, which governs the conservation of energy and the transfer and
additional notes
Increasing the repetition penalty has an effect, but it doesn’t always work and degrades the output whereas I’ve never seen this issue with Mistral 7b using Transformers.
About this issue
- Original URL
- State: closed
- Created 4 months ago
- Comments: 27
Hi, @djns99 ! I checked today on the
v0.9.0
release, and the issue seems to be resolved. Thanks a lot for all your help!Yet, I stumbled upon another minor issue. In
convert_checkpoint.py
when--load_model_on_cpu
is set, the conversion fails. It’s because thepreload_model
function tries to load the model ondevice_map='auto'
. I worked around it by addingload_model_on_cpu
:Just a heads up that you might want to fix it sometime 😉
Once again, thanks for all the help!