DeepSpeed: [BUG] Can't compile DeepSpeed version 0.8.1+ with Cuda 11.7
Describe the bug When I try to compile DeepSpeed from the 0.8.1 tag using docker and Cuda 11.7, the compilation fails. The docker image tries to compile DeepSpeed with the following command that has worked in the past:
DS_BUILD_OPS=1 pip install git+https://github.com/microsoft/DeepSpeed.git@v0.8.1
To Reproduce Steps to reproduce the behavior:
- Use a Linux machine
- Use the Dockerfile here
- Change line 30 of the file to be tag 0.8.1 rather than 0.8.0
- Try building the docker image.
Expected behavior Though compilation takes over 10 minutes, it still has always worked in the past with previous tags. I expect that compilation finishes successfully rather than fail.
ds_report output Due to the compilation failing, I can not do this
Screenshots
After this, I get terminal output that suggests its still trying to build but then I get another error and the image fails to build
System info (please complete the following information):
- Ubuntu 20.04
- Two RTX 3090s
- Using a docker image linked elsewhere in the issue
Docker context Shared Dockerfile elsewhere
Additional context I have seen similar issues with regard to Windows.
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 17 (10 by maintainers)
@loadams makes sense why that may cause issues. I will try that PR of the fix when I get a chance and let you know what I find. Thanks!