DeepSpeed: [RESOLVED] An error occurs when running in the deepspeed cpu inference
Describe the bug I am using 0.9.4 version after reflecting the two PRs below, but cpu inference is not working CUDA optional deepspeed ops https://github.com/microsoft/DeepSpeed/pull/2507 Enable page-locked tensors without CUDA https://github.com/microsoft/DeepSpeed/pull/2775
To Reproduce
- code
torch_dtype = torch.bfloat16 ds_engine = deepspeed.init_inference( model, tensor_parallel={ "tp_size": world_size, }, dtype=torch_dtype, replace_with_kernel_inject=True, save_mp_checkpoint_path=args.save_mp_checkpoint_path, # injection_policy=injection_policy, ) model = ds_engine.module
Describe the bug
- error
/opt/conda/envs/py310/lib/python3.10/site-packages/deepspeed/ops/op_builder/builder.py:545 in │ │ compute_capability_args │ │ │ │ 542 │ │ │ │ if cc not in ccs: │ │ 543 │ │ │ │ │ ccs.append(cc) │ │ 544 │ │ │ ccs = sorted(ccs) │ │ ❱ 545 │ │ │ ccs[-1] += '+PTX' │ │ 546 │ │ else: │ │ 547 │ │ │ # Cross-compile mode, compile for various architectures │ │ 548 │ │ │ # env override takes priority
- In the code below, if the cuda is not available, an error always appears https://github.com/microsoft/DeepSpeed/blob/7711bdbbd27c62ab4986f35c1ed01a0268fed92f/op_builder/builder.py#L528-L537
Expected behavior If there is a cpu inference guide such as an option that does not go through the code, Please provide it to me. I may not be able to find it well, but it is difficult to find information about cpu inference in the guide.
System info (please complete the following information):
- OS: Ubuntu 22.04
- GPU count and types : cpu only
- Python version : 3.10
- deepspeed version : 0.9.4
About this issue
- Original URL
- State: closed
- Created 10 months ago
- Comments: 19 (6 by maintainers)
I solved it by updating the gcc version
From the error posted DeepSpeed seems going through CUDA path instead of CPU path. Can you check the log to see if there is any line like the following?
There is a section in DeepSpeed tutorial showing how to run DeepSpeed model on CPU. https://www.deepspeed.ai/tutorials/accelerator-abstraction-interface/#run-deepspeed-model-on-cpu