LMFlow: run run_chatbot.sh RuntimeError: Tensors must be contiguous

Hello, I downloaded the gpt-neo-2.7B model and ran run_chatbot.sh, which displayed the following error? How can I resolve it?

(lmflow) root@shenma:~/LMFlow# ./scripts/run_chatbot.sh output_models/gpt-neo-2.7B
[2023-04-06 09:55:55,760] [WARNING] [runner.py:186:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
Detected CUDA_VISIBLE_DEVICES=0: setting --include=localhost:0
[2023-04-06 09:55:55,769] [INFO] [runner.py:550:main] cmd = /root/anaconda3/envs/lmflow/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMF19 --master_addr=127.0.0.1 --master_port=29500 --enable_each_rank_log=None examples/chatbot.py --deepspeed configs/ds_config_chatbot.json --model_name_or_path output_models/gpt-neo-2.7B
[2023-04-06 09:55:56,692] [INFO] [launch.py:142:main] WORLD INFO DICT: {'localhost': [0]}
[2023-04-06 09:55:56,692] [INFO] [launch.py:148:main] nnodes=1, num_local_procs=1, node_rank=0
[2023-04-06 09:55:56,692] [INFO] [launch.py:161:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0]})
[2023-04-06 09:55:56,692] [INFO] [launch.py:162:main] dist_world_size=1
[2023-04-06 09:55:56,692] [INFO] [launch.py:164:main] Setting CUDA_VISIBLE_DEVICES=0
Traceback (most recent call last):
  File "/root/LMFlow/examples/chatbot.py", line 117, in <module>
    main()
  File "/root/LMFlow/examples/chatbot.py", line 44, in main
    model = AutoModel.get_model(
  File "/root/LMFlow/src/lmflow/models/auto_model.py", line 14, in get_model
    return HFDecoderModel(model_args, *args, **kwargs)
  File "/root/LMFlow/src/lmflow/models/hf_decoder_model.py", line 224, in __init__
    self.ds_engine = deepspeed.initialize(model=self.backend_model, config_params=ds_config)[0]
  File "/root/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/__init__.py", line 125, in initialize
    engine = DeepSpeedEngine(args=args,
  File "/root/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 297, in __init__
    self._configure_distributed_model(model)
  File "/root/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1182, in _configure_distributed_model
    self._broadcast_model()
  File "/root/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1105, in _broadcast_model
    dist.broadcast(p,
  File "/root/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/comm/comm.py", line 123, in log_wrapper
    return func(*args, **kwargs)
  File "/root/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/comm/comm.py", line 228, in broadcast
    return cdb.broadcast(tensor=tensor, src=src, group=group, async_op=async_op)
  File "/root/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/comm/torch.py", line 78, in broadcast
    return torch.distributed.broadcast(tensor=tensor,
  File "/root/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py", line 1436, in wrapper
    return func(*args, **kwargs)
  File "/root/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py", line 1555, in broadcast
    work = group.broadcast([tensor], opts)
RuntimeError: Tensors must be contiguous
[2023-04-06 09:56:28,731] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 439575
[2023-04-06 09:56:28,731] [ERROR] [launch.py:324:sigkill_handler] ['/root/anaconda3/envs/lmflow/bin/python', '-u', 'examples/chatbot.py', '--local_rank=0', '--deepspeed', 'configs/ds_config_chatbot.json', '--model_name_or_path', 'output_models/gpt-neo-2.7B'] exits with return code = 1

deepspeed 0.8.3

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 17 (8 by maintainers)

Most upvoted comments

Thanks for providing more details! Looks like the problem was caused by multiple factors. Since loading gpt-neo-2.7B requires more than 16GB memory during the loading process, turning on RAM optimized loading becomes necessary. However, this option will result in splits of model tensors when the GPU memory is not enough, which leads to the Runtime Error here Tensors must be continguous.

To avoid this error, there are several options:

  • You may add more RAM to your machine/server, and turn off --use_ram_optimized_load False.
  • You may use a GPU with larger GPU memory, 16G will be sufficient.
  • You may use a CPU-only server with large RAM and run ./scripts/run_chatbot_cpu.sh, which supports CPU-only inference (although the speed may be much slower when compared with GPU).

We recommend Google Colab for running this type of experiments. Hope that answers your question. Thanks 😄

Okay, thank you. I closed the issue