flash-attention: Flash Attention 2 Error -> undefined symbol: _ZN2at4_ops9_pad_enum4callERKNS_6TensorEN3c108ArrayRefINS5_6SymIntEEElNS5_8optionalIdEE

I’m using nvcr.io/nvidia/pytorch:23.10-py3 and https://github.com/Dao-AILab/flash-attention/releases/download/v2.5.3/flash_attn-2.5.3+cu122torch2.1cxx11abiFALSE-cp310-cp310-linux_x86_64.whl. Getting the error below:

ImportError: /app/venv/lib/python3.10/site-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN2at4_ops9_pad_enum4callERKNS_6TensorEN3c108ArrayRefINS5_6SymIntEEElNS5_8optionalIdEE

Issue 451 might be related to this but error seems different. Are there any solutions to this? Tried 5 different combinations already.

About this issue

  • Original URL
  • State: open
  • Created 4 months ago
  • Reactions: 4
  • Comments: 25 (8 by maintainers)

Most upvoted comments

12.3 and 12.2 should be compatible. I’ve just tried nvcr pytorch 23.12 and it works fine

docker run --rm -it --gpus all --network="host" --shm-size=900gb nvcr.io/nvidia/pytorch:23.12-py3
pip install flash-attn==2.5.1.post1
ipython
In [1]: import torch

In [2]: from flash_attn import flash_attn_func

In [3]: q, k, v = torch.randn(1, 128, 3, 16, 64, dtype=torch.float16, device='cuda').unbind(2)

In [4]: out = flash_attn_func(q, k, v)

I’ve tested on a fresh machine with an A100, and the following combination works for a minimal docker install: container: nvcr.io/nvidia/pytorch:24.01-py3 pip install: pip install flash-attn==2.5.1.post1 --no-build-isolation --upgrade

I’ve had issues with 23.12-py3 but 24.01-py3 works perfectly.

Try following this?

docker run --rm -it --gpus all --network="host" --shm-size=900gb nvcr.io/nvidia/pytorch:23.12-py3
pip install flash-attn==2.5.1.post1