h2ogpt: MPT-7B, 30B RuntimeError: Placeholder storage has not been allocated on MPS device!
Trying to use MPT models with h2oai:
- python generate.py --base_model=mosaicml/mpt-7b-chat --score_model=None
- enter any prompt Expected behavior: Model is loaded and used
Observed behavior:
- model loaded and the ui is available
- there’s an exception:
Running on local URL: http://0.0.0.0:7860
To create a public link, set `share=True` in `launch()`.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
/Users/user/dev/lib/python3.10/site-packages/transformers/generation/utils.py:1452: UserWarning: You are calling .generate() with the `input_ids` being on a device type different than your model's device. `input_ids` is on mps, whereas the model is on cpu. You may experience unexpected behaviors or slower generation. Please make sure that you have put `input_ids` to the correct device by calling for example input_ids = input_ids.to('cpu') before running `.generate()`.
warnings.warn(
thread exception: (<class 'RuntimeError'>, RuntimeError('Placeholder storage has not been allocated on MPS device!'), <traceback object at 0x8fe882200>)
make stop: (<class 'RuntimeError'>, RuntimeError('Placeholder storage has not been allocated on MPS device!'), <traceback object at 0x8fe882200>)
hit stop
Traceback (most recent call last):
File "/Users/user/dev/lib/python3.10/site-packages/gradio/routes.py", line 437, in run_predict
output = await app.get_blocks().process_api(
File "/Users/user/dev/lib/python3.10/site-packages/gradio/blocks.py", line 1352, in process_api
result = await self.call_function(
File "/Users/user/dev/lib/python3.10/site-packages/gradio/blocks.py", line 1093, in call_function
prediction = await utils.async_iteration(iterator)
File "/Users/user/dev/lib/python3.10/site-packages/gradio/utils.py", line 341, in async_iteration
return await iterator.__anext__()
File "/Users/user/dev/lib/python3.10/site-packages/gradio/utils.py", line 334, in __anext__
return await anyio.to_thread.run_sync(
File "/Users/user/dev/lib/python3.10/site-packages/anyio/to_thread.py", line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/Users/user/dev/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
return await future
File "/Users/user/dev/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 807, in run
result = context.run(func, *args)
File "/Users/user/dev/lib/python3.10/site-packages/gradio/utils.py", line 317, in run_sync_iterator_async
return next(iterator)
File "/Users/user/dev/h2ogpt/src/gradio_runner.py", line 1428, in bot
for res in get_response(fun1, history):
File "/Users/user/dev/h2ogpt/src/gradio_runner.py", line 1385, in get_response
for output_fun in fun1():
File "/Users/user/dev/h2ogpt/src/gen.py", line 2011, in evaluate
raise thread.exc
File "/Users/user/dev/h2ogpt/src/utils.py", line 340, in run
self._return = self._target(*self._args, **self._kwargs)
File "/Users/user/dev/h2ogpt/src/gen.py", line 2114, in generate_with_exceptions
func(*args, **kwargs)
File "/Users/user/dev/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/Users/user/dev/lib/python3.10/site-packages/transformers/generation/utils.py", line 1522, in generate
return self.greedy_search(
File "/Users/user/dev/lib/python3.10/site-packages/transformers/generation/utils.py", line 2339, in greedy_search
outputs = self(
File "/Users/user/dev/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/Users/user/.cache/huggingface/modules/transformers_modules/mosaicml/mpt-7b-chat/c53dee01e05098f81cac11145f9bf45feedc5b2f/modeling_mpt.py", line 270, in forward
outputs = self.transformer(input_ids=input_ids, past_key_values=past_key_values, attention_mask=attention_mask, prefix_mask=prefix_mask, sequence_id=sequence_id, return_dict=return_dict, output_attentions=output_attentions, output_hidden_states=output_hidden_states, use_cache=use_cache)
File "/Users/user/dev/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/Users/user/.cache/huggingface/modules/transformers_modules/mosaicml/mpt-7b-chat/c53dee01e05098f81cac11145f9bf45feedc5b2f/modeling_mpt.py", line 168, in forward
tok_emb = self.wte(input_ids)
File "/Users/user/dev/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/Users/user/.cache/huggingface/modules/transformers_modules/mosaicml/mpt-7b-chat/c53dee01e05098f81cac11145f9bf45feedc5b2f/custom_embedding.py", line 11, in forward
return super().forward(input)
File "/Users/user/dev/lib/python3.10/site-packages/torch/nn/modules/sparse.py", line 162, in forward
return F.embedding(
File "/Users/user/dev/lib/python3.10/site-packages/torch/nn/functional.py", line 2210, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Placeholder storage has not been allocated on MPS device!
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 15 (8 by maintainers)
Updating pytorch to 2.1 changes the error to:
and generation works but is VERY slow (ie mpt-7B much slower than WizardLM-30B). GPU load is 0%, only CPU
No change, still
User specified an unsupported autocast device_type 'mps'I was able to start loading the model on MPS after tweaking the model config of the transformer. But I get this error since I don’t have enough memory
I have a 16GB Mac M1 (13.4.1 ©).
But I can see the GPU load when attempting to load the model.
@wizzard0 if you have larger memory than mine please try this
init_deviceto ‘mps’ atconfig.jsonof the transformer. (You can find it at~/.cache/huggingface/hub/models--mosaicml--mpt-7b-chat/snapshots/c53dee01e05098f81cac11145f9bf45feedc5b2f/config.json)if that solves your problem let me know
cc: @pseudotensor