DeepSpeed: [RESOLVED] An error occurs when running in the deepspeed cpu inference

Describe the bug I am using 0.9.4 version after reflecting the two PRs below, but cpu inference is not working CUDA optional deepspeed ops https://github.com/microsoft/DeepSpeed/pull/2507 Enable page-locked tensors without CUDA https://github.com/microsoft/DeepSpeed/pull/2775

To Reproduce

  • code
    torch_dtype = torch.bfloat16
    ds_engine = deepspeed.init_inference(
        model,
        tensor_parallel={
            "tp_size": world_size,
        },
        dtype=torch_dtype,
        replace_with_kernel_inject=True,
        save_mp_checkpoint_path=args.save_mp_checkpoint_path,
        # injection_policy=injection_policy,
    )
    model = ds_engine.module
    
    

Describe the bug

  • error
    /opt/conda/envs/py310/lib/python3.10/site-packages/deepspeed/ops/op_builder/builder.py:545 in    │
    │ compute_capability_args                                                                          │
    │                                                                                                  │
    │   542 │   │   │   │   if cc not in ccs:                                                          │
    │   543 │   │   │   │   │   ccs.append(cc)                                                         │
    │   544 │   │   │   ccs = sorted(ccs)                                                              │
    │ ❱ 545 │   │   │   ccs[-1] += '+PTX'                                                              │
    │   546 │   │   else:                                                                              │
    │   547 │   │   │   # Cross-compile mode, compile for various architectures                        │
    │   548 │   │   │   # env override takes priority    
    
  • In the code below, if the cuda is not available, an error always appears https://github.com/microsoft/DeepSpeed/blob/7711bdbbd27c62ab4986f35c1ed01a0268fed92f/op_builder/builder.py#L528-L537

Expected behavior If there is a cpu inference guide such as an option that does not go through the code, Please provide it to me. I may not be able to find it well, but it is difficult to find information about cpu inference in the guide.

System info (please complete the following information):

  • OS: Ubuntu 22.04
  • GPU count and types : cpu only
  • Python version : 3.10
  • deepspeed version : 0.9.4

About this issue

  • Original URL
  • State: closed
  • Created 10 months ago
  • Comments: 19 (6 by maintainers)

Most upvoted comments

I solved it by updating the gcc version

From the error posted DeepSpeed seems going through CUDA path instead of CPU path. Can you check the log to see if there is any line like the following?

Setting ds_accelerator to cpu (auto detect)

There is a section in DeepSpeed tutorial showing how to run DeepSpeed model on CPU. https://www.deepspeed.ai/tutorials/accelerator-abstraction-interface/#run-deepspeed-model-on-cpu