h2ogpt: MPT-7B, 30B RuntimeError: Placeholder storage has not been allocated on MPS device!

Trying to use MPT models with h2oai:

python generate.py --base_model=mosaicml/mpt-7b-chat --score_model=None
enter any prompt Expected behavior: Model is loaded and used

Observed behavior:

model loaded and the ui is available
there’s an exception:

Running on local URL:  http://0.0.0.0:7860

To create a public link, set `share=True` in `launch()`.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
/Users/user/dev/lib/python3.10/site-packages/transformers/generation/utils.py:1452: UserWarning: You are calling .generate() with the `input_ids` being on a device type different than your model's device. `input_ids` is on mps, whereas the model is on cpu. You may experience unexpected behaviors or slower generation. Please make sure that you have put `input_ids` to the correct device by calling for example input_ids = input_ids.to('cpu') before running `.generate()`.
  warnings.warn(
thread exception: (<class 'RuntimeError'>, RuntimeError('Placeholder storage has not been allocated on MPS device!'), <traceback object at 0x8fe882200>)
make stop: (<class 'RuntimeError'>, RuntimeError('Placeholder storage has not been allocated on MPS device!'), <traceback object at 0x8fe882200>)
hit stop
Traceback (most recent call last):
  File "/Users/user/dev/lib/python3.10/site-packages/gradio/routes.py", line 437, in run_predict
    output = await app.get_blocks().process_api(
  File "/Users/user/dev/lib/python3.10/site-packages/gradio/blocks.py", line 1352, in process_api
    result = await self.call_function(
  File "/Users/user/dev/lib/python3.10/site-packages/gradio/blocks.py", line 1093, in call_function
    prediction = await utils.async_iteration(iterator)
  File "/Users/user/dev/lib/python3.10/site-packages/gradio/utils.py", line 341, in async_iteration
    return await iterator.__anext__()
  File "/Users/user/dev/lib/python3.10/site-packages/gradio/utils.py", line 334, in __anext__
    return await anyio.to_thread.run_sync(
  File "/Users/user/dev/lib/python3.10/site-packages/anyio/to_thread.py", line 33, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/Users/user/dev/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
    return await future
  File "/Users/user/dev/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 807, in run
    result = context.run(func, *args)
  File "/Users/user/dev/lib/python3.10/site-packages/gradio/utils.py", line 317, in run_sync_iterator_async
    return next(iterator)
  File "/Users/user/dev/h2ogpt/src/gradio_runner.py", line 1428, in bot
    for res in get_response(fun1, history):
  File "/Users/user/dev/h2ogpt/src/gradio_runner.py", line 1385, in get_response
    for output_fun in fun1():
  File "/Users/user/dev/h2ogpt/src/gen.py", line 2011, in evaluate
    raise thread.exc
  File "/Users/user/dev/h2ogpt/src/utils.py", line 340, in run
    self._return = self._target(*self._args, **self._kwargs)
  File "/Users/user/dev/h2ogpt/src/gen.py", line 2114, in generate_with_exceptions
    func(*args, **kwargs)
  File "/Users/user/dev/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/Users/user/dev/lib/python3.10/site-packages/transformers/generation/utils.py", line 1522, in generate
    return self.greedy_search(
  File "/Users/user/dev/lib/python3.10/site-packages/transformers/generation/utils.py", line 2339, in greedy_search
    outputs = self(
  File "/Users/user/dev/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/user/.cache/huggingface/modules/transformers_modules/mosaicml/mpt-7b-chat/c53dee01e05098f81cac11145f9bf45feedc5b2f/modeling_mpt.py", line 270, in forward
    outputs = self.transformer(input_ids=input_ids, past_key_values=past_key_values, attention_mask=attention_mask, prefix_mask=prefix_mask, sequence_id=sequence_id, return_dict=return_dict, output_attentions=output_attentions, output_hidden_states=output_hidden_states, use_cache=use_cache)
  File "/Users/user/dev/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/user/.cache/huggingface/modules/transformers_modules/mosaicml/mpt-7b-chat/c53dee01e05098f81cac11145f9bf45feedc5b2f/modeling_mpt.py", line 168, in forward
    tok_emb = self.wte(input_ids)
  File "/Users/user/dev/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/user/.cache/huggingface/modules/transformers_modules/mosaicml/mpt-7b-chat/c53dee01e05098f81cac11145f9bf45feedc5b2f/custom_embedding.py", line 11, in forward
    return super().forward(input)
  File "/Users/user/dev/lib/python3.10/site-packages/torch/nn/modules/sparse.py", line 162, in forward
    return F.embedding(
  File "/Users/user/dev/lib/python3.10/site-packages/torch/nn/functional.py", line 2210, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Placeholder storage has not been allocated on MPS device!

About this issue

Original URL
State: closed
Created a year ago
Comments: 15 (8 by maintainers)

Most upvoted comments

Updating pytorch to 2.1 changes the error to:

Xformers is not installed correctly. If you want to use memory_efficient_attention to accelerate training use the following command to install Xformers
pip install xformers.
The model 'OptimizedModule' is not supported for . Supported models are ['BartForCausalLM', 'BertLMHeadModel', (...)

and generation works but is VERY slow (ie mpt-7B much slower than WizardLM-30B). GPU load is 0%, only CPU

wizzard0 on Jul 16, 2023

@wizzard0 could you please also change torch_dtype in config.json to float16 and see.

just a hunch pytorch/pytorch#78168 (comment)

No change, still User specified an unsupported autocast device_type 'mps'

wizzard0 on Jul 18, 2023

I was able to start loading the model on MPS after tweaking the model config of the transformer. But I get this error since I don’t have enough memory

RuntimeError: MPS backend out of memory (MPS allocated: 17.27 GB, other allocations: 819.94 MB, max allowed: 18.13 GB). Tried to allocate 192.00 MB on private pool. Use 
PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable the upper limit for memory allocations (may cause system failure).

But I can see the GPU load when attempting to load the model.

@wizzard0 if you have larger memory than mine please try this

change init_device to ‘mps’ at config.json of the transformer. (You can find it at ~/.cache/huggingface/hub/models--mosaicml--mpt-7b-chat/snapshots/c53dee01e05098f81cac11145f9bf45feedc5b2f/config.json)

if that solves your problem let me know

cc: @pseudotensor

Mathanraj-Sharma on Jul 18, 2023