DeepSpeed: [BUG] Can't compile DeepSpeed version 0.8.1+ with Cuda 11.7

Describe the bug When I try to compile DeepSpeed from the 0.8.1 tag using docker and Cuda 11.7, the compilation fails. The docker image tries to compile DeepSpeed with the following command that has worked in the past:

DS_BUILD_OPS=1 pip install git+https://github.com/microsoft/DeepSpeed.git@v0.8.1

To Reproduce Steps to reproduce the behavior:

  1. Use a Linux machine
  2. Use the Dockerfile here
  3. Change line 30 of the file to be tag 0.8.1 rather than 0.8.0
  4. Try building the docker image.

Expected behavior Though compilation takes over 10 minutes, it still has always worked in the past with previous tags. I expect that compilation finishes successfully rather than fail.

ds_report output Due to the compilation failing, I can not do this

Screenshots image

After this, I get terminal output that suggests its still trying to build but then I get another error and the image fails to build

image

System info (please complete the following information):

  • Ubuntu 20.04
  • Two RTX 3090s
  • Using a docker image linked elsewhere in the issue

Docker context Shared Dockerfile elsewhere

Additional context I have seen similar issues with regard to Windows.

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 17 (10 by maintainers)

Most upvoted comments

@loadams makes sense why that may cause issues. I will try that PR of the fix when I get a chance and let you know what I find. Thanks!