DeepSpeed: [BUG] Can't load OPT-30B and OPT-66B through checkpoints.json

Describe the bug

I can’t load OPT-30B and OPT-66B through checkpoints.json. If I load them with Huggingface from_pretrained, everything works fine. This bug is troublesome because my production nodes have far less memory than my dev node, so they don’t have enough CPU memory to load OPT-30B and OPT-66B.

To Reproduce python 3.7.7

git clone https://github.com/anselmwang/transformers-bloom-inference/
cd transformers-bloom-inference
git checkout explore_ds

pip install --upgrade pip
pip install transformers>=4.21.3 accelerate>=0.12.0
pip install deepspeed>=0.7.3

Without checkpoints_json, this command works date; deepspeed --num_gpus 4 bloom-inference-scripts/bloom-ds-inference.py --name facebook/opt-30b; date

Below is the stack trace when using checkpoints.json date; deepspeed --num_gpus 4 bloom-inference-scripts/bloom-ds-inference.py --name facebook/opt-30b --use_checkpoints_json; date

Traceback (most recent call last):                                                                                                                                                        
  File "/home/yuwan/GitRoot/opt_pipeline/transformers-bloom-inference/bloom-inference-scripts/bloom-ds-inference.py", line 192, in <module>                                               
    model = deepspeed.init_inference(                                                                                                                                                     
  File "/home/yuwan/GitRoot/opt_pipeline/transformers-bloom-inference/venv/lib/python3.9/site-packages/deepspeed/__init__.py", line 311, in init_inference                                
    engine = InferenceEngine(model, config=ds_inference_config)                                                                                                                           
  File "/home/yuwan/GitRoot/opt_pipeline/transformers-bloom-inference/venv/lib/python3.9/site-packages/deepspeed/inference/engine.py", line 127, in __init__
    self.module.to(device)                                                                                                                                                                
  File "/home/yuwan/GitRoot/opt_pipeline/transformers-bloom-inference/venv/lib/python3.9/site-packages/transformers/modeling_utils.py", line 1682, in to  
    return super().to(*args, **kwargs)                                                       
  File "/home/yuwan/GitRoot/opt_pipeline/transformers-bloom-inference/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 987, in to
    return self._apply(convert)
  File "/home/yuwan/GitRoot/opt_pipeline/transformers-bloom-inference/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 639, in _apply
    module._apply(fn)
  File "/home/yuwan/GitRoot/opt_pipeline/transformers-bloom-inference/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 639, in _apply
    module._apply(fn)
  File "/home/yuwan/GitRoot/opt_pipeline/transformers-bloom-inference/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 639, in _apply
    module._apply(fn)
  File "/home/yuwan/GitRoot/opt_pipeline/transformers-bloom-inference/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 662, in _apply
    param_applied = fn(param)
  File "/home/yuwan/GitRoot/opt_pipeline/transformers-bloom-inference/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 985, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
NotImplementedError: Cannot copy out of meta tensor; no data!

For OPT-66B, this command works date; deepspeed --num_gpus 4 bloom-inference-scripts/bloom-ds-inference.py --name facebook/opt-66b; date

But when turning on checkpoints.json, date; deepspeed --num_gpus 4 bloom-inference-scripts/bloom-ds-inference.py --name facebook/opt-66b --use_checkpoints_json; date, below is the stack trace

Traceback (most recent call last):                                                                                                                                                [9/1869]
  File "/home/yuwan/GitRoot/opt_pipeline/transformers-bloom-inference/bloom-inference-scripts/bloom-ds-inference.py", line 190, in <module>                                               
    model = deepspeed.init_inference(                                                                                                                                                     
  File "/home/yuwan/GitRoot/opt_pipeline/transformers-bloom-inference/venv/lib/python3.9/site-packages/deepspeed/__init__.py", line 311, in init_inference
    engine = InferenceEngine(model, config=ds_inference_config)                            
  File "/home/yuwan/GitRoot/opt_pipeline/transformers-bloom-inference/venv/lib/python3.9/site-packages/deepspeed/inference/engine.py", line 124, in __init__
    self._apply_injection_policy(config)                                                     
  File "/home/yuwan/GitRoot/opt_pipeline/transformers-bloom-inference/venv/lib/python3.9/site-packages/deepspeed/inference/engine.py", line 349, in _apply_injection_policy                   replace_transformer_layer(client_module,                                                                                                                                              
  File "/home/yuwan/GitRoot/opt_pipeline/transformers-bloom-inference/venv/lib/python3.9/site-packages/deepspeed/module_inject/replace_module.py", line 926, in replace_transformer_layer
    load_model_with_checkpoint(                                                              
  File "/home/yuwan/GitRoot/opt_pipeline/transformers-bloom-inference/venv/lib/python3.9/site-packages/deepspeed/module_inject/load_checkpoint.py", line 349, in load_model_with_checkpoin
t
    load_module_recursive(r_module)
  File "/home/yuwan/GitRoot/opt_pipeline/transformers-bloom-inference/venv/lib/python3.9/site-packages/deepspeed/module_inject/load_checkpoint.py", line 343, in load_module_recursive
    load_module_recursive(
  File "/home/yuwan/GitRoot/opt_pipeline/transformers-bloom-inference/venv/lib/python3.9/site-packages/deepspeed/module_inject/load_checkpoint.py", line 343, in load_module_recursive
    load_module_recursive(
  File "/home/yuwan/GitRoot/opt_pipeline/transformers-bloom-inference/venv/lib/python3.9/site-packages/deepspeed/module_inject/load_checkpoint.py", line 343, in load_module_recursive
    load_module_recursive(
  File "/home/yuwan/GitRoot/opt_pipeline/transformers-bloom-inference/venv/lib/python3.9/site-packages/deepspeed/module_inject/load_checkpoint.py", line 341, in load_module_recursive
    layer_policies[child.__class__](child, prefix + name + '.')
  File "/home/yuwan/GitRoot/opt_pipeline/transformers-bloom-inference/venv/lib/python3.9/site-packages/deepspeed/module_inject/load_checkpoint.py", line 258, in load_transformer_layer
    maybe_copy_qkv(module.attention,
  File "/home/yuwan/GitRoot/opt_pipeline/transformers-bloom-inference/venv/lib/python3.9/site-packages/deepspeed/module_inject/load_checkpoint.py", line 203, in maybe_copy_qkv
    k = sd[0][src_names[1]]
KeyError: 'model.decoder.layers.28.self_attn.k_proj.weight'

Expected behavior

ds_report output Please run ds_report to give us details about your setup.

--------------------------------------------------
DeepSpeed C++/CUDA extension op report                                                                                              
--------------------------------------------------                                                                                 
NOTE: Ops not installed will be just-in-time (JIT) compiled at                    
      runtime if needed. Op compatibility means that your system                                                                                          
      meet the required dependencies to JIT install the op.                                                         
--------------------------------------------------
JIT compiled ops requires ninja                                                                                                              
ninja .................. [OKAY]                                                            
--------------------------------------------------
op name ................ installed .. compatible                                                                        
--------------------------------------------------        
cpu_adam ............... [NO] ....... [OKAY]            
cpu_adagrad ............ [NO] ....... [OKAY]                                                             
fused_adam ............. [NO] ....... [OKAY]                                                                      
fused_lamb ............. [NO] ....... [OKAY]                                                                       
 [WARNING]  please install triton==1.0.0 if you want to use sparse attention
sparse_attn ............ [NO] ....... [NO]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
 [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.
 [WARNING]  async_io: please install the libaio-dev package with apt
 [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [NO] ....... [NO]
utils .................. [NO] ....... [OKAY]
quantizer .............. [NO] ....... [OKAY]
transformer_inference .. [NO] ....... [OKAY]
spatial_inference ...... [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/tmp/code/transformers-bloom-inference/venv/lib/python3.7/site-packages/torch']
torch version .................... 1.13.0+cu117
torch cuda version ............... 11.7
torch hip version ................ None
nvcc version ..................... 11.8
deepspeed install path ........... ['/tmp/code/transformers-bloom-inference/venv/lib/python3.7/site-packages/deepspeed']
deepspeed info ................... 0.7.6, unknown, unknown
deepspeed wheel compiled w. ...... torch 1.13, cuda 11.7

Screenshots If applicable, add screenshots to help explain your problem.

System info (please complete the following information):

  • OS: Ubuntu 18.04
  • GPU count and types: 1 node with 4 A6000, 46GB memory per GPU
  • (if applicable) what DeepSpeed-MII version are you using
  • (if applicable) Hugging Face Transformers/Accelerate/etc. versions
transformers             4.25.1
deepspeed                0.7.7
torch                    1.13.0
  • Python version: 3.7.7
  • Any other relevant info about your setup

Docker context Are you using a specific docker image that you can share? Not use docker Additional context Add any other context about the problem here.

About this issue

  • Original URL
  • State: open
  • Created 2 years ago
  • Comments: 19 (4 by maintainers)

Most upvoted comments

@RezaYazdaniAminabadi I can confirm that version 0.8.0 fixed the issue for me.

I can confirm that I’m able to replicate this. Interestingly, I’m finding that smaller OPT models work loading with meta tensor. It appears that models that are split in the HuggingFace checkpoints are causing this error (e.g., they have multiple pytorch_model-*-of-*.bin).

@RezaYazdaniAminabadi any idea the cause? I’m guessing we don’t catch this in our unit tests because we use small versions of these larger models to save time.