vllm: RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

Hello everyone, I always got this error for Baichuan and LLaMA models. And I found it’s caused by the single_query_cached_kv_attention method in vllm\model_executor\layers\attention.py. After calling of this method, the hidden output has some rows of “nan”. How can I fix this? Thanks!

Still have such errors even after installing xformers from source.

This is my code:

from vllm import LLM, SamplingParams
#from vllm.transformers_utils.configs.baichuan import BaiChuanConfig

prompts = [
        "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = SamplingParams(temperature=1, top_p=0.95)

llm = LLM(
        model="/.../Baichuan-7b",
        trust_remote_code=True,
        dtype='float16',
        gpu_memory_utilization=0.85,
        tokenizer_mode="slow"
    )
outputs = llm.generate(prompts, sampling_params)

for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")

and this is my python environment:

accelerate                0.21.0
aiofiles                  23.1.0
aiohttp                   3.8.5
aiosignal                 1.3.1
altair                    5.0.1
annotated-types           0.5.0
anyio                     3.7.1
appdirs                   1.4.4
argon2-cffi               21.3.0
argon2-cffi-bindings      21.2.0
arrow                     1.2.3
asttokens                 2.2.1
async-lru                 2.0.3
async-timeout             4.0.2
attrs                     23.1.0
Babel                     2.12.1
backcall                  0.2.0
beautifulsoup4            4.12.2
bleach                    6.0.0
blinker                   1.6.2
boltons                   23.0.0
brotlipy                  0.7.0
certifi                   2022.12.7
cffi                      1.15.1
charset-normalizer        2.0.4
click                     8.1.6
cmake                     3.27.0
comm                      0.1.3
conda                     23.3.1
conda-content-trust       0.1.3
conda-package-handling    2.0.2
conda_package_streaming   0.7.0
contourpy                 1.1.0
cryptography              39.0.1
cycler                    0.11.0
datasets                  2.14.0
debugpy                   1.6.7
decorator                 5.1.1
defusedxml                0.7.1
dill                      0.3.7
distlib                   0.3.7
docker-pycreds            0.4.0
editables                 0.5
exceptiongroup            1.1.2
executing                 1.2.0
fastapi                   0.100.0
fastjsonschema            2.18.0
ffmpy                     0.3.1
filelock                  3.12.2
Flask                     2.3.2
fonttools                 4.41.1
fqdn                      1.5.1
frozenlist                1.4.0
fsspec                    2023.6.0
gitdb                     4.0.10
GitPython                 3.1.32
gradio                    3.35.2
gradio_client             0.2.10
grpcio                    1.56.2
h11                       0.14.0
hatchling                 1.18.0
httpcore                  0.17.3
httpx                     0.24.1
huggingface-hub           0.16.4
idna                      3.4
ipykernel                 6.24.0
ipython                   8.14.0
ipython-genutils          0.2.0
ipywidgets                8.0.7
isoduration               20.11.0
itsdangerous              2.1.2
jedi                      0.18.2
jieba                     0.42.1
Jinja2                    3.1.2
joblib                    1.3.1
json5                     0.9.14
jsonpatch                 1.32
jsonpointer               2.1
jsonschema                4.18.4
jsonschema-specifications 2023.7.1
jupyter                   1.0.0
jupyter_client            8.3.0
jupyter-console           6.6.3
jupyter_core              5.3.1
jupyter-events            0.6.3
jupyter-lsp               2.2.0
jupyter_server            2.7.0
jupyter_server_terminals  0.4.4
jupyterlab                4.0.3
jupyterlab-pygments       0.2.2
jupyterlab_server         2.24.0
jupyterlab-widgets        3.0.8
kiwisolver                1.4.4
linkify-it-py             2.0.2
lit                       16.0.6
markdown-it-py            2.2.0
markdown2                 2.4.10
MarkupSafe                2.1.3
matplotlib                3.7.2
matplotlib-inline         0.1.6
mdit-py-plugins           0.3.3
mdurl                     0.1.2
mistune                   3.0.1
mpmath                    1.3.0
msgpack                   1.0.5
multidict                 6.0.4
multiprocess              0.70.15
mypy-extensions           1.0.0
nbclient                  0.8.0
nbconvert                 7.7.2
nbformat                  5.9.1
nest-asyncio              1.5.6
networkx                  3.1
nh3                       0.2.14
ninja                     1.11.1
nltk                      3.8.1
notebook                  7.0.0
notebook_shim             0.2.3
numpy                     1.25.1
nvidia-cublas-cu11        11.10.3.66
nvidia-cuda-cupti-cu11    11.7.101
nvidia-cuda-nvrtc-cu11    11.7.99
nvidia-cuda-runtime-cu11  11.7.99
nvidia-cudnn-cu11         8.5.0.96
nvidia-cufft-cu11         10.9.0.58
nvidia-curand-cu11        10.2.10.91
nvidia-cusolver-cu11      11.4.0.1
nvidia-cusparse-cu11      11.7.4.91
nvidia-nccl-cu11          2.14.3
nvidia-nvtx-cu11          11.7.91
orjson                    3.9.2
overrides                 7.3.1
packaging                 23.0
pandas                    2.0.3
pandocfilters             1.5.0
parso                     0.8.3
pathspec                  0.11.1
pathtools                 0.1.2
peft                      0.4.0
pexpect                   4.8.0
pickleshare               0.7.5
Pillow                    10.0.0
pip                       23.0.1
platformdirs              3.9.1
pluggy                    1.0.0
prometheus-client         0.17.1
prompt-toolkit            3.0.39
protobuf                  4.23.4
psutil                    5.9.5
ptyprocess                0.7.0
pure-eval                 0.2.2
pyarrow                   12.0.1
pycosat                   0.6.4
pycparser                 2.21
pydantic                  1.10.12
pydantic_core             2.3.0
pydub                     0.25.1
Pygments                  2.15.1
pyOpenSSL                 23.0.0
pyparsing                 3.0.9
pyre-extensions           0.0.29
PySocks                   1.7.1
python-dateutil           2.8.2
python-json-logger        2.0.7
python-multipart          0.0.6
pytz                      2023.3
PyYAML                    6.0.1
pyzmq                     25.1.0
qtconsole                 5.4.3
QtPy                      2.3.1
ray                       2.6.1
referencing               0.30.0
regex                     2023.6.3
requests                  2.28.1
rfc3339-validator         0.1.4
rfc3986-validator         0.1.1
rich                      13.4.2
rouge-chinese             1.0.3
rpds-py                   0.9.2
ruamel.yaml               0.17.21
ruamel.yaml.clib          0.2.6
safetensors               0.3.1
semantic-version          2.10.0
Send2Trash                1.8.2
sentencepiece             0.1.99
sentry-sdk                1.28.1
setproctitle              1.3.2
setuptools                65.6.3
shortuuid                 1.0.11
six                       1.16.0
smmap                     5.0.0
sniffio                   1.3.0
soupsieve                 2.4.1
stack-data                0.6.2
starlette                 0.27.0
svgwrite                  1.4.3
sympy                     1.12
terminado                 0.17.1
tinycss2                  1.2.1
tokenizers                0.13.3
tomli                     2.0.1
toolz                     0.12.0
torch                     2.0.1
tornado                   6.3.2
tqdm                      4.65.0
traitlets                 5.9.0
transformers              4.31.0
triton                    2.0.0
trl                       0.4.7
trove-classifiers         2023.7.6
typing_extensions         4.7.1
typing-inspect            0.9.0
tzdata                    2023.3
uc-micro-py               1.0.2
uri-template              1.3.0
urllib3                   1.26.15
uvicorn                   0.23.1
virtualenv                20.24.2
vllm                      0.1.2       /.../feng/OpenSource/vllm
wandb                     0.15.7
wavedrom                  2.0.3.post3
wcwidth                   0.2.6
webcolors                 1.13
webencodings              0.5.1
websocket-client          1.6.1
websockets                11.0.3
Werkzeug                  2.3.6
wheel                     0.38.4
widgetsnbextension        4.0.8
xformers                  0.0.20
xxhash                    3.2.0
yarl                      1.9.2
zstandard                 0.19.0

and my GPU info:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.73.08    Driver Version: 510.73.08    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GRID V100S-32Q      On   | 00000000:02:01.0 Off |                    0 |
| N/A   N/A    P0    N/A /  N/A |      0MiB / 32768MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 17

Most upvoted comments

🌡 Have you tried increasing the temperature?

Well try increasing the temperature value. I had very low temperature value along with other parameters such as top_k and top_p which made the next token distribution too steep and as the beam search’s logic, you will need to have multiple tokens available, and in the low temperature case I couldn’t have (because we know how temperature works, right?)

So I increased the temperature and it worked.

Try increasing the temp value and it should just work, if there are no other complexity involved.

We masked out values in logits where the token index is larger than context length, which could avoid corrupted logits due to nan from uninitialized k_cache, which is good. https://github.com/vllm-project/vllm/blob/d1744376ae9fdbfa6a2dc763e1c67309e138fa3d/csrc/attention/attention_kernels.cu#L186-L189

However, we did not mask out values in v_vec where the token index is larger than context length. As a result the following dot call is incorrect.

https://github.com/vllm-project/vllm/blob/d1744376ae9fdbfa6a2dc763e1c67309e138fa3d/csrc/attention/attention_kernels.cu#L264

0 (from logits_vec) * nan (from v_vec) is nan, unfortunately.

I get similar problems when use llama2-70B, set tensor parallel size to 8 on 8xA100, and change torch.empty to torch.zeros also not work. But when I use same code but only change model to gpt-neox/llama2-7B model it worked. Can someone offer me any ideas with llama2-70B?