DeepSpeed: Error in building Transformer kernel
I am using deepspeed/deepspeed:latest container (I tried to install Deepspeed with DS_BUILD_OPS=1 pip install deepspeed but I got the same error) and trying to use the Transformer kernel provided by DeepSpeed as follows:
from deepspeed import DeepSpeedTransformerLayer, DeepSpeedTransformerConfig
if __name__ == "__main__":
transformer_config = DeepSpeedTransformerConfig(
batch_size=40,
hidden_size=768,
heads=768 // 64,
intermediate_size=768 * 4,
attn_dropout_ratio=0.0,
hidden_dropout_ratio=0.0,
num_hidden_layers=4,
initializer_range=0.02,
fp16=True,
pre_layer_norm=True,
stochastic_mode=True,
)
layer = DeepSpeedTransformerLayer(config=transformer_config)
But I can’t initialize the layer with the following error
DeepSpeed Transformer config is {'layer_id': 0, 'batch_size': 40, 'hidden_size': 768, 'intermediate_size': 3072, 'heads': 12, 'attn_dropout_ratio': 0.0, 'hidden_dropout_ratio': 0.0, 'num_hidden_layers': 4, 'initializer_range': 0.02, 'fp16': True, 'pre_layer_norm': True, 'local_rank': -1, 'seed': -1, 'normalize_invertible': False, 'gelu_checkpoint': False, 'adjust_init_range': True, 'test_gemm': False, 'training': True, 'is_grad_enabled': True, 'attn_dropout_checkpoint': False, 'stochastic_mode': True, 'huggingface': False}
Using /root/.cache/torch_extensions as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /root/.cache/torch_extensions/stochastic_transformer/build.ninja...
Building extension module stochastic_transformer...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/8] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=stochastic_transformer -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1013\" -I/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/includes -isystem /opt/conda/lib/python3.8/site-packages/torch/include -isystem /opt/conda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.8/site-packages/torch/include/TH -isystem /opt/conda/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/conda/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_52,code=sm_52 -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_61,code=sm_61 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -D__STOCHASTIC_MODE__ -c /opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/cublas_wrappers.cu -o cublas_wrappers.cuda.o
[2/8] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=stochastic_transformer -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1013\" -I/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/includes -isystem /opt/conda/lib/python3.8/site-packages/torch/include -isystem /opt/conda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.8/site-packages/torch/include/TH -isystem /opt/conda/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/conda/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_52,code=sm_52 -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_61,code=sm_61 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -D__STOCHASTIC_MODE__ -c /opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/dropout_kernels.cu -o dropout_kernels.cuda.o
FAILED: dropout_kernels.cuda.o
/usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=stochastic_transformer -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1013\" -I/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/includes -isystem /opt/conda/lib/python3.8/site-packages/torch/include -isystem /opt/conda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.8/site-packages/torch/include/TH -isystem /opt/conda/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/conda/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_52,code=sm_52 -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_61,code=sm_61 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -D__STOCHASTIC_MODE__ -c /opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/dropout_kernels.cu -o dropout_kernels.cuda.o
/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/dropout_kernels.cu(102): error: no operator "*" matches these operands
operand types are: __half2 * const __half2
/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/dropout_kernels.cu(103): error: no operator "*" matches these operands
operand types are: __half2 * const __half2
/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/dropout_kernels.cu(216): error: no operator "*" matches these operands
operand types are: __half2 * const __half2
/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/dropout_kernels.cu(217): error: no operator "*" matches these operands
operand types are: __half2 * const __half2
/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/dropout_kernels.cu(335): error: no operator "*" matches these operands
operand types are: __half2 * const __half2
/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/dropout_kernels.cu(336): error: no operator "*" matches these operands
operand types are: __half2 * const __half2
6 errors detected in the compilation of "/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/dropout_kernels.cu".
[3/8] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=stochastic_transformer -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1013\" -I/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/includes -isystem /opt/conda/lib/python3.8/site-packages/torch/include -isystem /opt/conda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.8/site-packages/torch/include/TH -isystem /opt/conda/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/conda/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_52,code=sm_52 -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_61,code=sm_61 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -D__STOCHASTIC_MODE__ -c /opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/normalize_kernels.cu -o normalize_kernels.cuda.o
FAILED: normalize_kernels.cuda.o
/usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=stochastic_transformer -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1013\" -I/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/includes -isystem /opt/conda/lib/python3.8/site-packages/torch/include -isystem /opt/conda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.8/site-packages/torch/include/TH -isystem /opt/conda/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/conda/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_52,code=sm_52 -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_61,code=sm_61 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -D__STOCHASTIC_MODE__ -c /opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/normalize_kernels.cu -o normalize_kernels.cuda.o
/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/normalize_kernels.cu(880): error: no operator "*=" matches these operands
operand types are: __half2 *= __half2
/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/normalize_kernels.cu(883): error: no operator "-" matches these operands
operand types are: const __half2 - const __half2
/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/normalize_kernels.cu(885): error: ambiguous "?" operation: second operand of type "<error-type>" can be converted to third operand type "const __half2", and vice versa
/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/normalize_kernels.cu(890): error: no operator "*=" matches these operands
operand types are: __half2 *= __half2
/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/normalize_kernels.cu(892): error: no operator "-" matches these operands
operand types are: const __half2 - const __half2
/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/normalize_kernels.cu(893): error: ambiguous "?" operation: second operand of type "<error-type>" can be converted to third operand type "const __half2", and vice versa
/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/normalize_kernels.cu(901): error: no operator "*" matches these operands
operand types are: __half2 * __half2
/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/normalize_kernels.cu(901): error: identifier "h2sqrt" is undefined
/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/normalize_kernels.cu(905): error: identifier "h2rsqrt" is undefined
/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/normalize_kernels.cu(927): error: no operator "-" matches these operands
operand types are: - __half2
/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/normalize_kernels.cu(1189): error: no operator "*=" matches these operands
operand types are: __half2 *= __half2
/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/normalize_kernels.cu(1194): error: no operator "*=" matches these operands
operand types are: __half2 *= __half2
/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/normalize_kernels.cu(1205): error: no operator "-" matches these operands
operand types are: const __half2 - __half2
/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/normalize_kernels.cu(1206): error: no operator "*" matches these operands
operand types are: __half2 * __half2
/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/normalize_kernels.cu(1210): error: identifier "h2rsqrt" is undefined
/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/normalize_kernels.cu(1232): error: no operator "-" matches these operands
operand types are: - __half2
/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/normalize_kernels.cu(1232): error: identifier "h2rsqrt" is undefined
/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/normalize_kernels.cu(1621): error: no operator "*=" matches these operands
operand types are: __half2 *= __half2
/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/normalize_kernels.cu(1624): error: no operator "-" matches these operands
operand types are: const __half2 - const __half2
/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/normalize_kernels.cu(1626): error: ambiguous "?" operation: second operand of type "<error-type>" can be converted to third operand type "const __half2", and vice versa
/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/normalize_kernels.cu(1631): error: no operator "*=" matches these operands
operand types are: __half2 *= __half2
/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/normalize_kernels.cu(1633): error: no operator "-" matches these operands
operand types are: const __half2 - const __half2
/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/normalize_kernels.cu(1634): error: ambiguous "?" operation: second operand of type "<error-type>" can be converted to third operand type "const __half2", and vice versa
/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/normalize_kernels.cu(1642): error: no operator "*" matches these operands
operand types are: __half2 * __half2
/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/normalize_kernels.cu(1642): error: identifier "h2sqrt" is undefined
/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/normalize_kernels.cu(1646): error: identifier "h2rsqrt" is undefined
/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/normalize_kernels.cu(1668): error: no operator "-" matches these operands
operand types are: - __half2
/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/normalize_kernels.cu(1703): error: no operator "+" matches these operands
operand types are: __half2 + const __half2
/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/normalize_kernels.cu(1710): error: no operator "+" matches these operands
operand types are: __half2 + const __half2
/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/normalize_kernels.cu(1940): error: no operator "*=" matches these operands
operand types are: __half2 *= __half2
/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/normalize_kernels.cu(1946): error: no operator "*=" matches these operands
operand types are: __half2 *= __half2
/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/normalize_kernels.cu(1959): error: no operator "-" matches these operands
operand types are: __half2 - __half2
/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/normalize_kernels.cu(1960): error: no operator "*" matches these operands
operand types are: __half2 * __half2
/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/normalize_kernels.cu(1964): error: identifier "h2rsqrt" is undefined
/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/normalize_kernels.cu(1986): error: no operator "-" matches these operands
operand types are: - __half2
/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/normalize_kernels.cu(1986): error: identifier "h2rsqrt" is undefined
/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/normalize_kernels.cu(2021): error: no operator "+" matches these operands
operand types are: __half2 + const __half2
/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/normalize_kernels.cu(2027): error: no operator "+" matches these operands
operand types are: __half2 + const __half2
38 errors detected in the compilation of "/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/normalize_kernels.cu".
[4/8] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=stochastic_transformer -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1013\" -I/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/includes -isystem /opt/conda/lib/python3.8/site-packages/torch/include -isystem /opt/conda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.8/site-packages/torch/include/TH -isystem /opt/conda/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/conda/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_52,code=sm_52 -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_61,code=sm_61 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -D__STOCHASTIC_MODE__ -c /opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/general_kernels.cu -o general_kernels.cuda.o
[5/8] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=stochastic_transformer -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1013\" -I/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/includes -isystem /opt/conda/lib/python3.8/site-packages/torch/include -isystem /opt/conda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.8/site-packages/torch/include/TH -isystem /opt/conda/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/conda/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_52,code=sm_52 -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_61,code=sm_61 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -D__STOCHASTIC_MODE__ -c /opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/transform_kernels.cu -o transform_kernels.cuda.o
[6/8] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=stochastic_transformer -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1013\" -I/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/includes -isystem /opt/conda/lib/python3.8/site-packages/torch/include -isystem /opt/conda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.8/site-packages/torch/include/TH -isystem /opt/conda/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/conda/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_52,code=sm_52 -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_61,code=sm_61 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -D__STOCHASTIC_MODE__ -c /opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/gelu_kernels.cu -o gelu_kernels.cuda.o
[7/8] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=stochastic_transformer -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1013\" -I/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/includes -isystem /opt/conda/lib/python3.8/site-packages/torch/include -isystem /opt/conda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.8/site-packages/torch/include/TH -isystem /opt/conda/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/conda/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_52,code=sm_52 -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_61,code=sm_61 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -D__STOCHASTIC_MODE__ -c /opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/softmax_kernels.cu -o softmax_kernels.cuda.o
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
File "/opt/conda/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1549, in _run_ninja_build
subprocess.run(
File "/opt/conda/lib/python3.8/subprocess.py", line 512, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "experimentation.py", line 17, in <module>
layer = DeepSpeedTransformerLayer(config=transformer_config)
File "/opt/conda/lib/python3.8/site-packages/deepspeed/ops/transformer/transformer.py", line 543, in __init__
stochastic_transformer_cuda_module = StochasticTransformerBuilder().load()
File "/opt/conda/lib/python3.8/site-packages/deepspeed/ops/op_builder/builder.py", line 180, in load
return self.jit_load(verbose)
File "/opt/conda/lib/python3.8/site-packages/deepspeed/ops/op_builder/builder.py", line 208, in jit_load
op_module = load(
File "/opt/conda/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 999, in load
return _jit_compile(
File "/opt/conda/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1204, in _jit_compile
_write_ninja_file_and_build_library(
File "/opt/conda/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1308, in _write_ninja_file_and_build_library
_run_ninja_build(
File "/opt/conda/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1565, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error building extension 'stochastic_transformer'
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 28 (13 by maintainers)
root@x8a100-0000:/workspace# env | grep -i arch TORCH_CUDA_ARCH_LIST=5.2 6.0 6.1 7.0 7.5 8.0 8.6+PTX
export TORCH_CUDA_ARCH_LIST=7.0 DS_BUILD_OPS=1 pip3 install deepspeed
Worked, thank you.
unset TORCH_CUDA_ARCH_LISTfixed the problem for me.There we go, that’s the missing piece 😃 we have this fully reproduced now with these two steps:
nvcr.io/nvidia/pytorch:20.12-py3DS_BUILD_OPS=1 pip install deepspeedI believe we also know why this this exact build error is happening but not sure why it is being triggered. For some reason the build command is adding a gencode for compute capability (cc) 5.2 but there are clearly no cc 5.2 gpus on the box. We’ll dig into this further and report back once we have a fix.