TensorRT-LLM: stop words can not work. how to set it

I set the stop words like this : stop_words_list = np.array([[“”]], dtype=object), when it passed to function _to_word_list_format(self, word_dict: List[List[str]]), the stop words become [[[ 1 2] [ 2 -1]]].
but when I pass a prompt to codeLlama,and set the “max_tokens” to 1024,the real output finished before 1024 ,like this:

<s> import java.util.*;
import java.lang.*;

class Solution {
    /**
    For a given list of input numbers, calculate Mean Absolute Deviation
    around the mean of this dataset.
    Mean Absolute Deviation is the average absolute difference between each
    element and a centerpoint (mean in this case):
    MAD = average | x - x_mean |
    >>> meanAbsoluteDeviation(Arrays.asList(1.0, 2.0, 3.0, 4.0))
    1.0
     */
    public double meanAbsoluteDeviation(List<Double> numbers) {
        double sum = 0.0;
        double mean = 0.0;
        double mad = 0.0;
        for (double number : numbers) {
            sum += number;
        }
        mean = sum / numbers.size();
        for (double number : numbers) {
            mad += Math.abs(number - mean);
        }
        return mad / numbers.size();
    }
}

public class MeanAbsoluteDeviation {
    public static void main(String[] args) {
        Solution sol = new Solution();
        List<Double> numbers = new ArrayList<Double>();
        numbers.add(1.0);
        numbers.add(2.0);
        numbers.add(3.0);
        numbers.add(4.0);
        System.out.println(sol.meanAbsoluteDeviation(numbers));
    }
}
</s><s> package com.github.yamamotoj.module4.package97

class Foo09787 {
    fun method0() {
        Foo09786().method5()
    }

    fun method1() {
        method0()
    }

    fun method2() {
        method1()
    }

    fun method3() {
        method2()
    }

    fun method4() {
        method3()
    }

    fun method5() {
        method4()
    }
}
</s><s> package com.github.yamamotoj.module4.package99

class Foo09999 {
    fun method0() {
        Foo09998().method5()
    }

    fun method1() {
        method0()
    }

    fun method2() {
        method1()
    }

    fun method3() {
        method2()
    }

    fun method4() {
        method3()
    }

    fun method5() {
        method4()
    }
}
</s><s> package com.github.yamamotoj.module4.package97

class Foo09766 {
    fun method0() {
        Foo0

The "</s> " means the output maybe finished but the engine still generating until the 1024 token is generated.

Thank you !

About this issue

  • Original URL
  • State: closed
  • Created 8 months ago
  • Comments: 25

Most upvoted comments

Please try change code here https://github.com/NVIDIA/TensorRT-LLM/blob/release/0.5.0/tensorrt_llm/runtime/generation.py#L59 from

ids = tokenizer.encode(word)

to

ids = tokenizer.encode(word, add_special_tokens=False)

Thanks for your answer, but I use the triton backend, and maybe the function to_word_list_format in generation.py may not be used, and I change the preprocess to ids = tokenizer.encode(word, add_special_tokens=False) it became to。[[[2] [1]]],but the engine did not stop