TensorRT-LLM: stop words can not work. how to set it
I set the stop words like this :
stop_words_list = np.array([[“”]], dtype=object),
when it passed to function
_to_word_list_format(self, word_dict: List[List[str]]),
the stop words become [[[ 1 2] [ 2 -1]]].
but when I pass a prompt to codeLlama,and set the “max_tokens” to 1024,the real output finished before 1024 ,like this:
<s> import java.util.*;
import java.lang.*;
class Solution {
/**
For a given list of input numbers, calculate Mean Absolute Deviation
around the mean of this dataset.
Mean Absolute Deviation is the average absolute difference between each
element and a centerpoint (mean in this case):
MAD = average | x - x_mean |
>>> meanAbsoluteDeviation(Arrays.asList(1.0, 2.0, 3.0, 4.0))
1.0
*/
public double meanAbsoluteDeviation(List<Double> numbers) {
double sum = 0.0;
double mean = 0.0;
double mad = 0.0;
for (double number : numbers) {
sum += number;
}
mean = sum / numbers.size();
for (double number : numbers) {
mad += Math.abs(number - mean);
}
return mad / numbers.size();
}
}
public class MeanAbsoluteDeviation {
public static void main(String[] args) {
Solution sol = new Solution();
List<Double> numbers = new ArrayList<Double>();
numbers.add(1.0);
numbers.add(2.0);
numbers.add(3.0);
numbers.add(4.0);
System.out.println(sol.meanAbsoluteDeviation(numbers));
}
}
</s><s> package com.github.yamamotoj.module4.package97
class Foo09787 {
fun method0() {
Foo09786().method5()
}
fun method1() {
method0()
}
fun method2() {
method1()
}
fun method3() {
method2()
}
fun method4() {
method3()
}
fun method5() {
method4()
}
}
</s><s> package com.github.yamamotoj.module4.package99
class Foo09999 {
fun method0() {
Foo09998().method5()
}
fun method1() {
method0()
}
fun method2() {
method1()
}
fun method3() {
method2()
}
fun method4() {
method3()
}
fun method5() {
method4()
}
}
</s><s> package com.github.yamamotoj.module4.package97
class Foo09766 {
fun method0() {
Foo0
The "</s> " means the output maybe finished but the engine still generating until the 1024 token is generated.
Thank you !
About this issue
- Original URL
- State: closed
- Created 8 months ago
- Comments: 25
Thanks for your answer, but I use the triton backend, and maybe the function to_word_list_format in generation.py may not be used, and I change the preprocess to ids = tokenizer.encode(word, add_special_tokens=False) it became to。[[[2] [1]]],but the engine did not stop