DeepSpeed: [BUG] DeepSpeed not compatible with latest transformers (4.22.1)

Describe the bug I’m trying to get BLOOM to run on a Lambda Labs GPU cloud 8x40GB instance. It appears DeepSpeed isn’t compatible with the latest transformers -> ImportError: cannot import name 'cached_path' from 'transformers.utils'.

I ran the following command:

python bloom-inference-server/cli.py --model_name microsoft/bloom-deepspeed-inference-int8 --dtype int8 --deployment_framework ds_inference --generate_kwargs '{"min_length": 100, "max_new_tokens": 100, "do_sample": false}'

Here is the output:

[2022-09-21 16:10:37,900] [INFO] [deployment.py:74:deploy] *************DeepSpeed Optimizations: True*************
[2022-09-21 16:10:40,808] [INFO] [server_client.py:206:_initialize_service] multi-gpu deepspeed launch: ['deepspeed', '--num_gpus', '8', '--no_local_rank', '--no_python', '/usr/bin/python', '-m', 'mii.launch.multi_gpu_server', '--task-name', 'text-generation', '--model', 'bigscience/bloom', '--model-path', '/home/ubuntu/.cache/huggingface/hub/models--microsoft--bloom-deepspeed-inference-int8/snapshots/aa00a6626f6484a2eef68e06d1e089e4e32aa571', '--port', '50950', '--ds-optimize', '--provider', 'hugging-face-llm', '--config', 'eyJ0ZW5zb3JfcGFyYWxsZWwiOiA4LCAicG9ydF9udW1iZXIiOiA1MDk1MCwgImR0eXBlIjogImludDgiLCAiZW5hYmxlX2N1ZGFfZ3JhcGgiOiBmYWxzZSwgImNoZWNrcG9pbnRfZGljdCI6IHsiY2hlY2twb2ludHMiOiB7Im5vbl90cCI6IFsibm9uLXRwLnB0Il0sICJ0cCI6IFsidHBfMDBfMDAucHQiLCAidHBfMDFfMDAucHQiLCAidHBfMDJfMDAucHQiLCAidHBfMDNfMDAucHQiLCAidHBfMDBfMDEucHQiLCAidHBfMDFfMDEucHQiLCAidHBfMDJfMDEucHQiLCAidHBfMDNfMDEucHQiLCAidHBfMDBfMDIucHQiLCAidHBfMDFfMDIucHQiLCAidHBfMDJfMDIucHQiLCAidHBfMDNfMDIucHQiLCAidHBfMDBfMDMucHQiLCAidHBfMDFfMDMucHQiLCAidHBfMDJfMDMucHQiLCAidHBfMDNfMDMucHQiLCAidHBfMDBfMDQucHQiLCAidHBfMDFfMDQucHQiLCAidHBfMDJfMDQucHQiLCAidHBfMDNfMDQucHQiLCAidHBfMDBfMDUucHQiLCAidHBfMDFfMDUucHQiLCAidHBfMDJfMDUucHQiLCAidHBfMDNfMDUucHQiLCAidHBfMDBfMDYucHQiLCAidHBfMDFfMDYucHQiLCAidHBfMDJfMDYucHQiLCAidHBfMDNfMDYucHQiLCAidHBfMDBfMDcucHQiLCAidHBfMDFfMDcucHQiLCAidHBfMDJfMDcucHQiLCAidHBfMDNfMDcucHQiXX0sICJkdHlwZSI6ICJpbnQ4IiwgInBhcmFsbGVsaXphdGlvbiI6ICJ0cCIsICJ0cF9zaXplIjogNCwgInR5cGUiOiAiQkxPT00iLCAidmVyc2lvbiI6IDF9fQ==']
[2022-09-21 16:10:41,887] [WARNING] [runner.py:178:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
[2022-09-21 16:10:42,184] [INFO] [runner.py:504:main] cmd = /usr/bin/python3 -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMCwgMSwgMiwgMywgNCwgNSwgNiwgN119 --master_addr=127.0.0.1 --master_port=29500 --no_python --no_local_rank /usr/bin/python -m mii.launch.multi_gpu_server --task-name text-generation --model bigscience/bloom --model-path /home/ubuntu/.cache/huggingface/hub/models--microsoft--bloom-deepspeed-inference-int8/snapshots/aa00a6626f6484a2eef68e06d1e089e4e32aa571 --port 50950 --ds-optimize --provider hugging-face-llm --config eyJ0ZW5zb3JfcGFyYWxsZWwiOiA4LCAicG9ydF9udW1iZXIiOiA1MDk1MCwgImR0eXBlIjogImludDgiLCAiZW5hYmxlX2N1ZGFfZ3JhcGgiOiBmYWxzZSwgImNoZWNrcG9pbnRfZGljdCI6IHsiY2hlY2twb2ludHMiOiB7Im5vbl90cCI6IFsibm9uLXRwLnB0Il0sICJ0cCI6IFsidHBfMDBfMDAucHQiLCAidHBfMDFfMDAucHQiLCAidHBfMDJfMDAucHQiLCAidHBfMDNfMDAucHQiLCAidHBfMDBfMDEucHQiLCAidHBfMDFfMDEucHQiLCAidHBfMDJfMDEucHQiLCAidHBfMDNfMDEucHQiLCAidHBfMDBfMDIucHQiLCAidHBfMDFfMDIucHQiLCAidHBfMDJfMDIucHQiLCAidHBfMDNfMDIucHQiLCAidHBfMDBfMDMucHQiLCAidHBfMDFfMDMucHQiLCAidHBfMDJfMDMucHQiLCAidHBfMDNfMDMucHQiLCAidHBfMDBfMDQucHQiLCAidHBfMDFfMDQucHQiLCAidHBfMDJfMDQucHQiLCAidHBfMDNfMDQucHQiLCAidHBfMDBfMDUucHQiLCAidHBfMDFfMDUucHQiLCAidHBfMDJfMDUucHQiLCAidHBfMDNfMDUucHQiLCAidHBfMDBfMDYucHQiLCAidHBfMDFfMDYucHQiLCAidHBfMDJfMDYucHQiLCAidHBfMDNfMDYucHQiLCAidHBfMDBfMDcucHQiLCAidHBfMDFfMDcucHQiLCAidHBfMDJfMDcucHQiLCAidHBfMDNfMDcucHQiXX0sICJkdHlwZSI6ICJpbnQ4IiwgInBhcmFsbGVsaXphdGlvbiI6ICJ0cCIsICJ0cF9zaXplIjogNCwgInR5cGUiOiAiQkxPT00iLCAidmVyc2lvbiI6IDF9fQ==
[2022-09-21 16:10:43,214] [INFO] [launch.py:136:main] WORLD INFO DICT: {'localhost': [0, 1, 2, 3, 4, 5, 6, 7]}
[2022-09-21 16:10:43,214] [INFO] [launch.py:142:main] nnodes=1, num_local_procs=8, node_rank=0
[2022-09-21 16:10:43,214] [INFO] [launch.py:155:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0, 1, 2, 3, 4, 5, 6, 7]})
[2022-09-21 16:10:43,214] [INFO] [launch.py:156:main] dist_world_size=8
[2022-09-21 16:10:43,214] [INFO] [launch.py:158:main] Setting CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
[2022-09-21 16:10:45,832] [INFO] [server_client.py:115:_wait_until_server_is_live] waiting for server to start...
--------------------------------------------------------------------------
WARNING: No preset parameters were found for the device that Open MPI
detected:

  Local host:            129-159-32-184
  Device name:           mlx5_0
  Device vendor ID:      0x02c9
  Device vendor part ID: 4122

Default device parameters will be used, which may result in lower
performance.  You can edit any of the files specified by the
btl_openib_device_param_files MCA parameter to set values for your
device.

NOTE: You can turn off this warning by setting the MCA parameter
      btl_openib_warn_no_device_params_found to 0.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
No OpenFabrics connection schemes reported that they were able to be
used on a specific port.  As such, the openib BTL (OpenFabrics
support) will be disabled for this port.

  Local host:           129-159-32-184
  Local device:         mlx5_0
  Local port:           1
  CPCs attempted:       udcm
--------------------------------------------------------------------------
--------------------------------------------------------------------------
WARNING: No preset parameters were found for the device that Open MPI
detected:

  Local host:            129-159-32-184
  Device name:           mlx5_0
  Device vendor ID:      0x02c9
  Device vendor part ID: 4122

Default device parameters will be used, which may result in lower
performance.  You can edit any of the files specified by the
btl_openib_device_param_files MCA parameter to set values for your
device.

NOTE: You can turn off this warning by setting the MCA parameter
      btl_openib_warn_no_device_params_found to 0.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
No OpenFabrics connection schemes reported that they were able to be
used on a specific port.  As such, the openib BTL (OpenFabrics
support) will be disabled for this port.

  Local host:           129-159-32-184
  Local device:         mlx5_0
  Local port:           1
  CPCs attempted:       udcm
--------------------------------------------------------------------------
--------------------------------------------------------------------------
WARNING: No preset parameters were found for the device that Open MPI
detected:

  Local host:            129-159-32-184
  Device name:           mlx5_0
  Device vendor ID:      0x02c9
  Device vendor part ID: 4122

Default device parameters will be used, which may result in lower
performance.  You can edit any of the files specified by the
btl_openib_device_param_files MCA parameter to set values for your
device.

NOTE: You can turn off this warning by setting the MCA parameter
      btl_openib_warn_no_device_params_found to 0.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
No OpenFabrics connection schemes reported that they were able to be
used on a specific port.  As such, the openib BTL (OpenFabrics
support) will be disabled for this port.

  Local host:           129-159-32-184
  Local device:         mlx5_0
  Local port:           1
  CPCs attempted:       udcm
--------------------------------------------------------------------------
--------------------------------------------------------------------------
WARNING: No preset parameters were found for the device that Open MPI
detected:

  Local host:            129-159-32-184
  Device name:           mlx5_0
  Device vendor ID:      0x02c9
  Device vendor part ID: 4122

Default device parameters will be used, which may result in lower
performance.  You can edit any of the files specified by the
btl_openib_device_param_files MCA parameter to set values for your
device.

NOTE: You can turn off this warning by setting the MCA parameter
      btl_openib_warn_no_device_params_found to 0.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
No OpenFabrics connection schemes reported that they were able to be
used on a specific port.  As such, the openib BTL (OpenFabrics
support) will be disabled for this port.

  Local host:           129-159-32-184
  Local device:         mlx5_0
  Local port:           1
  CPCs attempted:       udcm
--------------------------------------------------------------------------
--------------------------------------------------------------------------
WARNING: No preset parameters were found for the device that Open MPI
detected:

  Local host:            129-159-32-184
  Device name:           mlx5_0
  Device vendor ID:      0x02c9
  Device vendor part ID: 4122

Default device parameters will be used, which may result in lower
performance.  You can edit any of the files specified by the
btl_openib_device_param_files MCA parameter to set values for your
device.

NOTE: You can turn off this warning by setting the MCA parameter
      btl_openib_warn_no_device_params_found to 0.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
No OpenFabrics connection schemes reported that they were able to be
used on a specific port.  As such, the openib BTL (OpenFabrics
support) will be disabled for this port.

  Local host:           129-159-32-184
  Local device:         mlx5_0
  Local port:           1
  CPCs attempted:       udcm
--------------------------------------------------------------------------
--------------------------------------------------------------------------
WARNING: No preset parameters were found for the device that Open MPI
detected:

  Local host:            129-159-32-184
  Device name:           mlx5_0
  Device vendor ID:      0x02c9
  Device vendor part ID: 4122

Default device parameters will be used, which may result in lower
performance.  You can edit any of the files specified by the
btl_openib_device_param_files MCA parameter to set values for your
device.

NOTE: You can turn off this warning by setting the MCA parameter
      btl_openib_warn_no_device_params_found to 0.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
No OpenFabrics connection schemes reported that they were able to be
used on a specific port.  As such, the openib BTL (OpenFabrics
support) will be disabled for this port.

  Local host:           129-159-32-184
  Local device:         mlx5_0
  Local port:           1
  CPCs attempted:       udcm
--------------------------------------------------------------------------
--------------------------------------------------------------------------
WARNING: No preset parameters were found for the device that Open MPI
detected:

  Local host:            129-159-32-184
  Device name:           mlx5_0
  Device vendor ID:      0x02c9
  Device vendor part ID: 4122

Default device parameters will be used, which may result in lower
performance.  You can edit any of the files specified by the
btl_openib_device_param_files MCA parameter to set values for your
device.

NOTE: You can turn off this warning by setting the MCA parameter
      btl_openib_warn_no_device_params_found to 0.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
No OpenFabrics connection schemes reported that they were able to be
used on a specific port.  As such, the openib BTL (OpenFabrics
support) will be disabled for this port.

  Local host:           129-159-32-184
  Local device:         mlx5_0
  Local port:           1
  CPCs attempted:       udcm
--------------------------------------------------------------------------
--------------------------------------------------------------------------
WARNING: No preset parameters were found for the device that Open MPI
detected:

  Local host:            129-159-32-184
  Device name:           mlx5_0
  Device vendor ID:      0x02c9
  Device vendor part ID: 4122

Default device parameters will be used, which may result in lower
performance.  You can edit any of the files specified by the
btl_openib_device_param_files MCA parameter to set values for your
device.

NOTE: You can turn off this warning by setting the MCA parameter
      btl_openib_warn_no_device_params_found to 0.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
No OpenFabrics connection schemes reported that they were able to be
used on a specific port.  As such, the openib BTL (OpenFabrics
support) will be disabled for this port.

  Local host:           129-159-32-184
  Local device:         mlx5_0
  Local port:           1
  CPCs attempted:       udcm
--------------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/mii/launch/multi_gpu_server.py", line 70, in <module>
    main()
  File "/home/ubuntu/.local/lib/python3.8/site-packages/mii/launch/multi_gpu_server.py", line 56, in main
    inference_pipeline = load_models(task_name=args.task_name,
  File "/home/ubuntu/.local/lib/python3.8/site-packages/mii/models/load_models.py", line 45, in load_models
    from mii.models.providers.llm import load_hf_llm
  File "/home/ubuntu/.local/lib/python3.8/site-packages/mii/models/providers/llm.py", line 8, in <module>
    from transformers.utils import WEIGHTS_NAME, WEIGHTS_INDEX_NAME, cached_path, hf_bucket_url
ImportError: cannot import name 'cached_path' from 'transformers.utils' (/home/ubuntu/.local/lib/python3.8/site-packages/transformers/utils/__init__.py)
Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/mii/launch/multi_gpu_server.py", line 70, in <module>
    main()
  File "/home/ubuntu/.local/lib/python3.8/site-packages/mii/launch/multi_gpu_server.py", line 56, in main
    inference_pipeline = load_models(task_name=args.task_name,
  File "/home/ubuntu/.local/lib/python3.8/site-packages/mii/models/load_models.py", line 45, in load_models
    from mii.models.providers.llm import load_hf_llm
  File "/home/ubuntu/.local/lib/python3.8/site-packages/mii/models/providers/llm.py", line 8, in <module>
    from transformers.utils import WEIGHTS_NAME, WEIGHTS_INDEX_NAME, cached_path, hf_bucket_url
ImportError: cannot import name 'cached_path' from 'transformers.utils' (/home/ubuntu/.local/lib/python3.8/site-packages/transformers/utils/__init__.py)
Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/mii/launch/multi_gpu_server.py", line 70, in <module>
    main()
  File "/home/ubuntu/.local/lib/python3.8/site-packages/mii/launch/multi_gpu_server.py", line 56, in main
    inference_pipeline = load_models(task_name=args.task_name,
  File "/home/ubuntu/.local/lib/python3.8/site-packages/mii/models/load_models.py", line 45, in load_models
    from mii.models.providers.llm import load_hf_llm
  File "/home/ubuntu/.local/lib/python3.8/site-packages/mii/models/providers/llm.py", line 8, in <module>
    from transformers.utils import WEIGHTS_NAME, WEIGHTS_INDEX_NAME, cached_path, hf_bucket_url
ImportError: cannot import name 'cached_path' from 'transformers.utils' (/home/ubuntu/.local/lib/python3.8/site-packages/transformers/utils/__init__.py)
Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/mii/launch/multi_gpu_server.py", line 70, in <module>
    main()
  File "/home/ubuntu/.local/lib/python3.8/site-packages/mii/launch/multi_gpu_server.py", line 56, in main
    inference_pipeline = load_models(task_name=args.task_name,
  File "/home/ubuntu/.local/lib/python3.8/site-packages/mii/models/load_models.py", line 45, in load_models
    from mii.models.providers.llm import load_hf_llm
  File "/home/ubuntu/.local/lib/python3.8/site-packages/mii/models/providers/llm.py", line 8, in <module>
    from transformers.utils import WEIGHTS_NAME, WEIGHTS_INDEX_NAME, cached_path, hf_bucket_url
ImportError: cannot import name 'cached_path' from 'transformers.utils' (/home/ubuntu/.local/lib/python3.8/site-packages/transformers/utils/__init__.py)
Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/mii/launch/multi_gpu_server.py", line 70, in <module>
    main()
  File "/home/ubuntu/.local/lib/python3.8/site-packages/mii/launch/multi_gpu_server.py", line 56, in main
    inference_pipeline = load_models(task_name=args.task_name,
  File "/home/ubuntu/.local/lib/python3.8/site-packages/mii/models/load_models.py", line 45, in load_models
    from mii.models.providers.llm import load_hf_llm
  File "/home/ubuntu/.local/lib/python3.8/site-packages/mii/models/providers/llm.py", line 8, in <module>
    from transformers.utils import WEIGHTS_NAME, WEIGHTS_INDEX_NAME, cached_path, hf_bucket_url
ImportError: cannot import name 'cached_path' from 'transformers.utils' (/home/ubuntu/.local/lib/python3.8/site-packages/transformers/utils/__init__.py)
Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/mii/launch/multi_gpu_server.py", line 70, in <module>
    main()
  File "/home/ubuntu/.local/lib/python3.8/site-packages/mii/launch/multi_gpu_server.py", line 56, in main
    inference_pipeline = load_models(task_name=args.task_name,
  File "/home/ubuntu/.local/lib/python3.8/site-packages/mii/models/load_models.py", line 45, in load_models
    from mii.models.providers.llm import load_hf_llm
  File "/home/ubuntu/.local/lib/python3.8/site-packages/mii/models/providers/llm.py", line 8, in <module>
    from transformers.utils import WEIGHTS_NAME, WEIGHTS_INDEX_NAME, cached_path, hf_bucket_url
ImportError: cannot import name 'cached_path' from 'transformers.utils' (/home/ubuntu/.local/lib/python3.8/site-packages/transformers/utils/__init__.py)
Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/mii/launch/multi_gpu_server.py", line 70, in <module>
    main()
  File "/home/ubuntu/.local/lib/python3.8/site-packages/mii/launch/multi_gpu_server.py", line 56, in main
    inference_pipeline = load_models(task_name=args.task_name,
  File "/home/ubuntu/.local/lib/python3.8/site-packages/mii/models/load_models.py", line 45, in load_models
    from mii.models.providers.llm import load_hf_llm
  File "/home/ubuntu/.local/lib/python3.8/site-packages/mii/models/providers/llm.py", line 8, in <module>
    from transformers.utils import WEIGHTS_NAME, WEIGHTS_INDEX_NAME, cached_path, hf_bucket_url
ImportError: cannot import name 'cached_path' from 'transformers.utils' (/home/ubuntu/.local/lib/python3.8/site-packages/transformers/utils/__init__.py)
Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/mii/launch/multi_gpu_server.py", line 70, in <module>
    main()
  File "/home/ubuntu/.local/lib/python3.8/site-packages/mii/launch/multi_gpu_server.py", line 56, in main
    inference_pipeline = load_models(task_name=args.task_name,
  File "/home/ubuntu/.local/lib/python3.8/site-packages/mii/models/load_models.py", line 45, in load_models
    from mii.models.providers.llm import load_hf_llm
  File "/home/ubuntu/.local/lib/python3.8/site-packages/mii/models/providers/llm.py", line 8, in <module>
    from transformers.utils import WEIGHTS_NAME, WEIGHTS_INDEX_NAME, cached_path, hf_bucket_url
ImportError: cannot import name 'cached_path' from 'transformers.utils' (/home/ubuntu/.local/lib/python3.8/site-packages/transformers/utils/__init__.py)
[2022-09-21 16:10:50,262] [INFO] [launch.py:286:sigkill_handler] Killing subprocess 83474
[2022-09-21 16:10:50,349] [INFO] [launch.py:286:sigkill_handler] Killing subprocess 83475
[2022-09-21 16:10:50,422] [INFO] [launch.py:286:sigkill_handler] Killing subprocess 83476
[2022-09-21 16:10:50,422] [INFO] [launch.py:286:sigkill_handler] Killing subprocess 83477
[2022-09-21 16:10:50,493] [INFO] [launch.py:286:sigkill_handler] Killing subprocess 83478
[2022-09-21 16:10:50,565] [INFO] [launch.py:286:sigkill_handler] Killing subprocess 83479
[2022-09-21 16:10:50,582] [INFO] [launch.py:286:sigkill_handler] Killing subprocess 83480
[2022-09-21 16:10:50,657] [INFO] [launch.py:286:sigkill_handler] Killing subprocess 83481
[2022-09-21 16:10:50,769] [ERROR] [launch.py:292:sigkill_handler] ['/usr/bin/python', '-m', 'mii.launch.multi_gpu_server', '--task-name', 'text-generation', '--model', 'bigscience/bloom', '--model-path', '/home/ubuntu/.cache/huggingface/hub/models--microsoft--bloom-deepspeed-inference-int8/snapshots/aa00a6626f6484a2eef68e06d1e089e4e32aa571', '--port', '50950', '--ds-optimize', '--provider', 'hugging-face-llm', '--config', 'eyJ0ZW5zb3JfcGFyYWxsZWwiOiA4LCAicG9ydF9udW1iZXIiOiA1MDk1MCwgImR0eXBlIjogImludDgiLCAiZW5hYmxlX2N1ZGFfZ3JhcGgiOiBmYWxzZSwgImNoZWNrcG9pbnRfZGljdCI6IHsiY2hlY2twb2ludHMiOiB7Im5vbl90cCI6IFsibm9uLXRwLnB0Il0sICJ0cCI6IFsidHBfMDBfMDAucHQiLCAidHBfMDFfMDAucHQiLCAidHBfMDJfMDAucHQiLCAidHBfMDNfMDAucHQiLCAidHBfMDBfMDEucHQiLCAidHBfMDFfMDEucHQiLCAidHBfMDJfMDEucHQiLCAidHBfMDNfMDEucHQiLCAidHBfMDBfMDIucHQiLCAidHBfMDFfMDIucHQiLCAidHBfMDJfMDIucHQiLCAidHBfMDNfMDIucHQiLCAidHBfMDBfMDMucHQiLCAidHBfMDFfMDMucHQiLCAidHBfMDJfMDMucHQiLCAidHBfMDNfMDMucHQiLCAidHBfMDBfMDQucHQiLCAidHBfMDFfMDQucHQiLCAidHBfMDJfMDQucHQiLCAidHBfMDNfMDQucHQiLCAidHBfMDBfMDUucHQiLCAidHBfMDFfMDUucHQiLCAidHBfMDJfMDUucHQiLCAidHBfMDNfMDUucHQiLCAidHBfMDBfMDYucHQiLCAidHBfMDFfMDYucHQiLCAidHBfMDJfMDYucHQiLCAidHBfMDNfMDYucHQiLCAidHBfMDBfMDcucHQiLCAidHBfMDFfMDcucHQiLCAidHBfMDJfMDcucHQiLCAidHBfMDNfMDcucHQiXX0sICJkdHlwZSI6ICJpbnQ4IiwgInBhcmFsbGVsaXphdGlvbiI6ICJ0cCIsICJ0cF9zaXplIjogNCwgInR5cGUiOiAiQkxPT00iLCAidmVyc2lvbiI6IDF9fQ=='] exits with return code = 1
[2022-09-21 16:10:50,837] [INFO] [server_client.py:115:_wait_until_server_is_live] waiting for server to start...
Traceback (most recent call last):
  File "bloom-inference-server/cli.py", line 63, in <module>
    main()
  File "bloom-inference-server/cli.py", line 26, in main
    model = get_model_class(args.deployment_framework)(args)
  File "/home/ubuntu/transformers-bloom-inference/bloom-inference-server/models/ds_inference.py", line 92, in __init__
    mii.deploy(
  File "/home/ubuntu/.local/lib/python3.8/site-packages/mii/deployment.py", line 94, in deploy
    return _deploy_local(deployment_name, model_path=model_path)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/mii/deployment.py", line 100, in _deploy_local
    mii.utils.import_score_file(deployment_name).init()
  File "/tmp/mii_cache/ds_inference_grpc_server/score.py", line 29, in init
    model = mii.MIIServerClient(task,
  File "/home/ubuntu/.local/lib/python3.8/site-packages/mii/server_client.py", line 90, in __init__
    self._wait_until_server_is_live()
  File "/home/ubuntu/.local/lib/python3.8/site-packages/mii/server_client.py", line 113, in _wait_until_server_is_live
    raise RuntimeError("server crashed for some reason, unable to proceed")
RuntimeError: server crashed for some reason, unable to proceed

To Reproduce Steps to reproduce the behavior:

pip install deepspeed>=0.7.3 transformers>=4.21.3 accelerate>=0.12.0 bitsandbytes protobuf==3.20.*
git clone https://github.com/huggingface/transformers-bloom-inference.git
cd transformers-bloom-inference
pip install .
cd ..
git clone https://github.com/microsoft/DeepSpeed-MII
cd DeepSpeed-MII
pip install .
cd ..
cd transformers-bloom-inference
python bloom-inference-server/cli.py --model_name microsoft/bloom-deepspeed-inference-int8 --dtype int8 --deployment_framework ds_inference --generate_kwargs '{"min_length": 100, "max_new_tokens": 100, "do_sample": false}'

Expected behavior Expect the bloom-inference-server to run without crashing

ds_report output

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [NO] ....... [OKAY]
cpu_adagrad ............ [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
fused_lamb ............. [NO] ....... [OKAY]
 [WARNING]  please install triton==1.0.0 if you want to use sparse attention
sparse_attn ............ [NO] ....... [NO]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
 [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.
 [WARNING]  async_io: please install the libaio-dev package with apt
 [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [NO] ....... [NO]
utils .................. [NO] ....... [OKAY]
quantizer .............. [NO] ....... [OKAY]
transformer_inference .. [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/usr/lib/python3/dist-packages/torch']
torch version .................... 1.11.0
torch cuda version ............... 11.6
torch hip version ................ None
nvcc version ..................... 11.6
deepspeed install path ........... ['/home/ubuntu/.local/lib/python3.8/site-packages/deepspeed']
deepspeed info ................... 0.7.3, unknown, unknown
deepspeed wheel compiled w. ...... torch 1.11, cuda 11.6

Screenshots None

System info (please complete the following information):

OS: Ubuntu 20.04.5 LTS
GPU count and types: 1 machine with 8x40GB A100s
Python version: Python 3.8.10
Any other relevant info about your setup: Lambda Labs GPU cloud

Launcher context I am running bloom-inference-server/cli.py from here which is using the deepspeed launcher

Docker context N/A

Additional context None

About this issue

Original URL
State: closed
Created 2 years ago
Comments: 22 (11 by maintainers)

Most upvoted comments

Thanks @mrwyattii I was able to get it working 😃

tjarmain on Oct 31, 2022