flash-attention: Not able to install 2.0

Tried pip install and setup.py install both

/home/yellow/flash-attention/csrc/cutlass/include/cute/stride.hpp(112): warning: calling a __host__ function("__builtin_unreachable") from a __host__ __device__ function("cute::crd2idx< ::cute::tuple< ::cute::Underscore,  ::cute::Underscore, int > ,  ::cute::tuple< ::cute::tuple< ::cute::constant<int, (int)2> ,  ::cute::constant<int, (int)2> ,  ::cute::constant<int, (int)2>  > ,  ::cute::constant<int, (int)2> ,  ::cute::tuple< ::cute::constant<int, (int)2> ,  ::cute::constant<int, (int)2>  >  > ,  ::cute::tuple< ::cute::tuple< ::cute::constant<int, (int)1> ,  ::cute::constant<int, (int)2> ,  ::cute::constant<int, (int)4>  > ,  ::cute::constant<int, (int)8> ,  ::cute::tuple< ::cute::constant<int, (int)16> ,  ::cute::constant<int, (int)32>  >  > > ") is not allowed

Killed

txas info    : Used 255 registers, 576 bytes cmem[0]
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
  File "/home/yellow/anaconda3/envs/fla2/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1893, in _run_ninja_build
    subprocess.run(
  File "/home/yellow/anaconda3/envs/fla2/lib/python3.9/subprocess.py", line 524, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/yellow/flash-attention/setup.py", line 201, in <module>
    setup(
  File "/home/yellow/anaconda3/envs/fla2/lib/python3.9/site-packages/setuptools/__init__.py", line 107, in setup
    return distutils.core.setup(**attrs)
  File "/home/yellow/anaconda3/envs/fla2/lib/python3.9/site-packages/setuptools/_distutils/core.py", line 185, in setup
    return run_commands(dist)
  File "/home/yellow/anaconda3/envs/fla2/lib/python3.9/site-packages/setuptools/_distutils/core.py", line 201, in run_commands
    dist.run_commands()
  File "/home/yellow/anaconda3/envs/fla2/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
    self.run_command(cmd)
  File "/home/yellow/anaconda3/envs/fla2/lib/python3.9/site-packages/setuptools/dist.py", line 1234, in run_command
    super().run_command(command)
  File "/home/yellow/anaconda3/envs/fla2/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
    cmd_obj.run()
  File "/home/yellow/anaconda3/envs/fla2/lib/python3.9/site-packages/setuptools/command/install.py", line 80, in run
    self.do_egg_install()
  File "/home/yellow/anaconda3/envs/fla2/lib/python3.9/site-packages/setuptools/command/install.py", line 129, in do_egg_install
    self.run_command('bdist_egg')
  File "/home/yellow/anaconda3/envs/fla2/lib/python3.9/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
    self.distribution.run_command(command)
  File "/home/yellow/anaconda3/envs/fla2/lib/python3.9/site-packages/setuptools/dist.py", line 1234, in run_command
    super().run_command(command)
  File "/home/yellow/anaconda3/envs/fla2/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
    cmd_obj.run()
  File "/home/yellow/anaconda3/envs/fla2/lib/python3.9/site-packages/setuptools/command/bdist_egg.py", line 164, in run
    cmd = self.call_command('install_lib', warn_dir=0)
  File "/home/yellow/anaconda3/envs/fla2/lib/python3.9/site-packages/setuptools/command/bdist_egg.py", line 150, in call_command
    self.run_command(cmdname)
  File "/home/yellow/anaconda3/envs/fla2/lib/python3.9/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
    self.distribution.run_command(command)
  File "/home/yellow/anaconda3/envs/fla2/lib/python3.9/site-packages/setuptools/dist.py", line 1234, in run_command
    super().run_command(command)
  File "/home/yellow/anaconda3/envs/fla2/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
    cmd_obj.run()
  File "/home/yellow/anaconda3/envs/fla2/lib/python3.9/site-packages/setuptools/command/install_lib.py", line 11, in run
    self.build()
  File "/home/yellow/anaconda3/envs/fla2/lib/python3.9/site-packages/setuptools/_distutils/command/install_lib.py", line 111, in build
    self.run_command('build_ext')
  File "/home/yellow/anaconda3/envs/fla2/lib/python3.9/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
    self.distribution.run_command(command)
  File "/home/yellow/anaconda3/envs/fla2/lib/python3.9/site-packages/setuptools/dist.py", line 1234, in run_command
    super().run_command(command)
  File "/home/yellow/anaconda3/envs/fla2/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
    cmd_obj.run()
  File "/home/yellow/anaconda3/envs/fla2/lib/python3.9/site-packages/setuptools/command/build_ext.py", line 84, in run
    _build_ext.run(self)
  File "/home/yellow/anaconda3/envs/fla2/lib/python3.9/site-packages/setuptools/_distutils/command/build_ext.py", line 345, in run
    self.build_extensions()
  File "/home/yellow/anaconda3/envs/fla2/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 843, in build_extensions
    build_ext.build_extensions(self)
  File "/home/yellow/anaconda3/envs/fla2/lib/python3.9/site-packages/setuptools/_distutils/command/build_ext.py", line 467, in build_extensions
    self._build_extensions_serial()
  File "/home/yellow/anaconda3/envs/fla2/lib/python3.9/site-packages/setuptools/_distutils/command/build_ext.py", line 493, in _build_extensions_serial
    self.build_extension(ext)
  File "/home/yellow/anaconda3/envs/fla2/lib/python3.9/site-packages/setuptools/command/build_ext.py", line 246, in build_extension
    _build_ext.build_extension(self, ext)
  File "/home/yellow/anaconda3/envs/fla2/lib/python3.9/site-packages/setuptools/_distutils/command/build_ext.py", line 548, in build_extension
    objects = self.compiler.compile(
  File "/home/yellow/anaconda3/envs/fla2/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 658, in unix_wrap_ninja_compile
    _write_ninja_file_and_compile_objects(
  File "/home/yellow/anaconda3/envs/fla2/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1574, in _write_ninja_file_and_compile_objects
    _run_ninja_build(
  File "/home/yellow/anaconda3/envs/fla2/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1909, in _run_ninja_build
    raise RuntimeError(message) from e
RuntimeError: Error compiling objects for extension

About this issue

  • Original URL
  • State: open
  • Created a year ago
  • Comments: 32 (8 by maintainers)

Most upvoted comments

I also reproduce the error. Setting MAX_JOBS=1 in the environment fixes it for me, so it seems that compilation has become resource-intensive enough for a parallel Ninja build to overwhelm many systems. It’s a long-term question, but I wonder if the current approach of statically-compiled CUDA kernels is sustainable. Perhaps there is value to considering JIT compilation, e.g. with Triton or NVRTC?

Unfortunately for me, I still get the same error even if MAX_JOBS=1 is set. Also tried building from source with the same error. Any temporary solution available?

You can reinstall Python from conda-forge (no need to change the Python version) conda install python=3.x.xx --channel conda-forge

seeing the same problem here:

4, in run
    _build_ext.run(self)
  File "/home/mila/a/aghajohm/scratch/.conda/envs/competenlp/lib/python3.10/site-packages/setuptools/_distutils/command/build_ext.py", line 345, in run
    self.build_extensions()
  File "/home/mila/a/aghajohm/scratch/.conda/envs/competenlp/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 843, in build_extensions
    build_ext.build_extensions(self)
  File "/home/mila/a/aghajohm/scratch/.conda/envs/competenlp/lib/python3.10/site-packages/setuptools/_distutils/command/build_ext.py", line 467, in build_extensions
    self._build_extensions_serial()
  File "/home/mila/a/aghajohm/scratch/.conda/envs/competenlp/lib/python3.10/site-packages/setuptools/_distutils/command/build_ext.py", line 493, in _build_extensions_serial
    self.build_extension(ext)
  File "/home/mila/a/aghajohm/scratch/.conda/envs/competenlp/lib/python3.10/site-packages/setuptools/command/build_ext.py", line 246, in build_extension
    _build_ext.build_extension(self, ext)
  File "/home/mila/a/aghajohm/scratch/.conda/envs/competenlp/lib/python3.10/site-packages/setuptools/_distutils/command/build_ext.py", line 548, in build_extension
    objects = self.compiler.compile(
  File "/home/mila/a/aghajohm/scratch/.conda/envs/competenlp/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 658, in unix_wrap_ninja_compile
    _write_ninja_file_and_compile_objects(
  File "/home/mila/a/aghajohm/scratch/.conda/envs/competenlp/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1574, in _write_ninja_file_and_compile_objects
    _run_ninja_build(
  File "/home/mila/a/aghajohm/scratch/.conda/envs/competenlp/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1909, in _run_ninja_build
    raise RuntimeError(message) from e
RuntimeError: Error compiling objects for extension

I am setting MAX_JOBS=1 to see if it improves performance.

update: setting MAX_JOBS=1 solved it for me. It got installed albeit slowly.

Yeah I personally don’t like the fact that we’re templating so heavily (for dropout / no dropout, causal / not causal, different head dimensions, whether seqlen is divisible by 128 or not, different GPU types). The goal has been to get maximum performance, perhaps at the expense of compilation time.

  • Agree with you that JIT compilation is interesting. I don’t have any experience there however.
  • Another way is to have pre-built wheels that folks can just download. I’ll get to that once I’m done fixing some of the edge cases with the backward pass.

I also reproduce the error. Setting MAX_JOBS=1 in the environment fixes it for me, so it seems that compilation has become resource-intensive enough for a parallel Ninja build to overwhelm many systems. It’s a long-term question, but I wonder if the current approach of statically-compiled CUDA kernels is sustainable. Perhaps there is value to considering JIT compilation, e.g. with Triton or NVRTC?