modelscope: RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)

trying to run the following code, ends up with the error.

from modelscope.pipelines import pipeline
from modelscope.outputs import OutputKeys

#device = torch.device('cpu')


p = pipeline('text-to-video-synthesis', 'damo/text-to-video-synthesis')
test_text = {
        'text': 'A panda eating bamboo on a rock.',
    }
output_video_path = p(test_text,)[OutputKeys.OUTPUT_VIDEO]
print('output_video_path:', output_video_path)
Traceback (most recent call last):
  File "test.py", line 11, in <module>
    output_video_path = p(test_text,)[OutputKeys.OUTPUT_VIDEO]
  File "/home/sadmin/miniconda3/envs/modelscope/lib/python3.7/site-packages/modelscope/pipelines/base.py", line 212, in __call__
    output = self._process_single(input, *args, **kwargs)
  File "/home/sadmin/miniconda3/envs/modelscope/lib/python3.7/site-packages/modelscope/pipelines/base.py", line 247, in _process_single
    out = self.forward(out, **forward_params)
  File "/home/sadmin/miniconda3/envs/modelscope/lib/python3.7/site-packages/modelscope/pipelines/multi_modal/text_to_video_synthesis_pipeline.py", line 58, in forward
    video = self.model(input)
  File "/home/sadmin/miniconda3/envs/modelscope/lib/python3.7/site-packages/modelscope/models/base/base_model.py", line 34, in __call__
    return self.postprocess(self.forward(*args, **kwargs))
  File "/home/sadmin/miniconda3/envs/modelscope/lib/python3.7/site-packages/modelscope/models/multi_modal/video_synthesis/text_to_video_synthesis_model.py", line 167, in forward
    eta=0.0)
  File "/home/sadmin/miniconda3/envs/modelscope/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/sadmin/miniconda3/envs/modelscope/lib/python3.7/site-packages/modelscope/models/multi_modal/video_synthesis/diffusion.py", line 221, in ddim_sample_loop
    ddim_timesteps, eta)
  File "/home/sadmin/miniconda3/envs/modelscope/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/sadmin/miniconda3/envs/modelscope/lib/python3.7/site-packages/modelscope/models/multi_modal/video_synthesis/diffusion.py", line 169, in ddim_sample
    percentile, guide_scale)
  File "/home/sadmin/miniconda3/envs/modelscope/lib/python3.7/site-packages/modelscope/models/multi_modal/video_synthesis/diffusion.py", line 120, in p_mean_variance
    var = _i(self.posterior_variance, t, xt)
  File "/home/sadmin/miniconda3/envs/modelscope/lib/python3.7/site-packages/modelscope/models/multi_modal/video_synthesis/diffusion.py", line 14, in _i
    return tensor[t].view(shape).to(x)
RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)

I am using an RTX 3060 12GB version, before the whole thing crashes, it uses about 6GB of VRAM.

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Reactions: 3
  • Comments: 17

Most upvoted comments

See this page

We have fixed this bug by adding “tensor = tensor.to(x.device)”, so the “return tensor[t].view(shape).to(x)” is Line 15 now. @ThatCoffeeGuy

Go to AppData\Local\Programs\Python\Python310\Lib\site-packages\modelscope\models\multi_modal\video_synthesis\diffusion.py

and change line 10 - 15 to

def _i(tensor, t, x):
 r"""Index tensor using t and format the output according to x.
 """
 shape = (x.size(0), ) + (1, ) * (x.ndim - 1)
 tt = t.to('cpu')
 return tensor[tt].view(shape).to(x)

See this page We have fixed this bug by adding “tensor = tensor.to(x.device)”, so the “return tensor[t].view(shape).to(x)” is Line 15 now. @ThatCoffeeGuy

thank you, I can confirm that after I made the modifications it started to work for me too. However unfortunately I only have a 12GB GPU and it runs out of memory.

When I try to run it in CPU mode by export CUDA_VISIBLE_DEVICES="", I can see it starts filling up system ram (I have 64GB of it) but then it crashes with:

RuntimeError: TextToVideoSynthesisPipeline: TextToVideoSynthesis: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.

Could you please tell me if the suggested fix in the error message is going to work and where should I put that line exactly? Thank you.

@WangJiuniu Can you help to look at this problem, can we support text to video run in cpu-only enviroment