TTS: [Bug] Doesn't respect the cuda flag

Describe the bug

Here is the bug:

PS E:\AI> tts --model_name "tts_models/multilingual/multi-dataset/xtts_v1" --text "Ceci est un teste de voix." --language_idx "fr"  --use_cuda False
 > tts_models/multilingual/multi-dataset/xtts_v1 is already downloaded.
 > Using model: xtts
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "C:\Users\Max\AppData\Local\Programs\Python\Python311\Scripts\tts.exe\__main__.py", line 7, in <module>
  File "C:\Users\Max\AppData\Local\Programs\Python\Python311\Lib\site-packages\TTS\bin\synthesize.py", line 401, in main
    synthesizer = Synthesizer(
                  ^^^^^^^^^^^^
  File "C:\Users\Max\AppData\Local\Programs\Python\Python311\Lib\site-packages\TTS\utils\synthesizer.py", line 109, in __init__
    self._load_tts_from_dir(model_dir, use_cuda)
  File "C:\Users\Max\AppData\Local\Programs\Python\Python311\Lib\site-packages\TTS\utils\synthesizer.py", line 164, in _load_tts_from_dir
    self.tts_model.load_checkpoint(config, checkpoint_dir=model_dir, eval=True)
  File "C:\Users\Max\AppData\Local\Programs\Python\Python311\Lib\site-packages\TTS\tts\models\xtts.py", line 645, in load_checkpoint
    self.load_state_dict(load_fsspec(model_path)["model"], strict=strict)
                         ^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Max\AppData\Local\Programs\Python\Python311\Lib\site-packages\TTS\utils\io.py", line 86, in load_fsspec
    return torch.load(f, map_location=map_location, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Max\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch-2.0.1-py3.11-win-amd64.egg\torch\serialization.py", line 809, in load
    return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Max\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch-2.0.1-py3.11-win-amd64.egg\torch\serialization.py", line 1172, in _load
    result = unpickler.load()
             ^^^^^^^^^^^^^^^^
  File "C:\Users\Max\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch-2.0.1-py3.11-win-amd64.egg\torch\serialization.py", line 1142, in persistent_load
    typed_storage = load_tensor(dtype, nbytes, key, _maybe_decode_ascii(location))
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Max\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch-2.0.1-py3.11-win-amd64.egg\torch\serialization.py", line 1116, in load_tensor
    wrap_storage=restore_location(storage, location),
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Max\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch-2.0.1-py3.11-win-amd64.egg\torch\serialization.py", line 217, in default_restore_location
    result = fn(storage, location)
             ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Max\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch-2.0.1-py3.11-win-amd64.egg\torch\serialization.py", line 182, in _cuda_deserialize
    device = validate_cuda_device(location)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Max\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch-2.0.1-py3.11-win-amd64.egg\torch\serialization.py", line 166, in validate_cuda_device
    raise RuntimeError('Attempting to deserialize object on a CUDA '
RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.
PS E:\AI>

Also doesn’t work with a file:

from TTS.api import TTS
tts = TTS("tts_models/multilingual/multi-dataset/xtts_v1", gpu=False)

tts.tts_to_file(text="Ceci est un test de voix en utilisant python.",
                file_path="E:\AI\output.wav",
                language="fr")
 > tts_models/multilingual/multi-dataset/xtts_v1 is already downloaded.
 > Using model: xtts
Traceback (most recent call last):
  File "e:\AI\test.py", line 2, in <module>
    tts = TTS("tts_models/multilingual/multi-dataset/xtts_v1", gpu=False)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Max\AppData\Local\Programs\Python\Python311\Lib\site-packages\TTS\api.py", line 81, in __init__
    self.load_tts_model_by_name(model_name, gpu)
  File "C:\Users\Max\AppData\Local\Programs\Python\Python311\Lib\site-packages\TTS\api.py", line 185, in load_tts_model_by_name
    self.synthesizer = Synthesizer(
                       ^^^^^^^^^^^^
  File "C:\Users\Max\AppData\Local\Programs\Python\Python311\Lib\site-packages\TTS\utils\synthesizer.py", line 109, in __init__
    self._load_tts_from_dir(model_dir, use_cuda)
  File "C:\Users\Max\AppData\Local\Programs\Python\Python311\Lib\site-packages\TTS\utils\synthesizer.py", line 164, in _load_tts_from_dir
    self.tts_model.load_checkpoint(config, checkpoint_dir=model_dir, eval=True)
  File "C:\Users\Max\AppData\Local\Programs\Python\Python311\Lib\site-packages\TTS\tts\models\xtts.py", line 645, in load_checkpoint
    self.load_state_dict(load_fsspec(model_path)["model"], strict=strict)
                         ^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Max\AppData\Local\Programs\Python\Python311\Lib\site-packages\TTS\utils\io.py", line 86, in load_fsspec
    return torch.load(f, map_location=map_location, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Max\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch-2.0.1-py3.11-win-amd64.egg\torch\serialization.py", line 809, in load
    return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Max\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch-2.0.1-py3.11-win-amd64.egg\torch\serialization.py", line 1172, in _load
    result = unpickler.load()
             ^^^^^^^^^^^^^^^^
  File "C:\Users\Max\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch-2.0.1-py3.11-win-amd64.egg\torch\serialization.py", line 1142, in persistent_load
    typed_storage = load_tensor(dtype, nbytes, key, _maybe_decode_ascii(location))
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Max\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch-2.0.1-py3.11-win-amd64.egg\torch\serialization.py", line 1116, in load_tensor
    wrap_storage=restore_location(storage, location),
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Max\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch-2.0.1-py3.11-win-amd64.egg\torch\serialization.py", line 217, in default_restore_location
    result = fn(storage, location)
             ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Max\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch-2.0.1-py3.11-win-amd64.egg\torch\serialization.py", line 182, in _cuda_deserialize
    device = validate_cuda_device(location)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Max\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch-2.0.1-py3.11-win-amd64.egg\torch\serialization.py", line 166, in validate_cuda_device
    raise RuntimeError('Attempting to deserialize object on a CUDA '
RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.
PS C:\Users\Max> ```


### To Reproduce

tts --model_name "tts_models/multilingual/multi-dataset/xtts_v1" --text "Ceci est un teste de voix." --language_idx "fr"  --use_cuda False

### Expected behavior

Generate audio without gpu

### Logs

```shell
PS E:\AI> tts --model_name "tts_models/multilingual/multi-dataset/xtts_v1" --text "Ceci est un teste de voix." --language_idx "fr"  --use_cuda False
 > tts_models/multilingual/multi-dataset/xtts_v1 is already downloaded.
 > Using model: xtts
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "C:\Users\Max\AppData\Local\Programs\Python\Python311\Scripts\tts.exe\__main__.py", line 7, in <module>
  File "C:\Users\Max\AppData\Local\Programs\Python\Python311\Lib\site-packages\TTS\bin\synthesize.py", line 401, in main
    synthesizer = Synthesizer(
                  ^^^^^^^^^^^^
  File "C:\Users\Max\AppData\Local\Programs\Python\Python311\Lib\site-packages\TTS\utils\synthesizer.py", line 109, in __init__
    self._load_tts_from_dir(model_dir, use_cuda)
  File "C:\Users\Max\AppData\Local\Programs\Python\Python311\Lib\site-packages\TTS\utils\synthesizer.py", line 164, in _load_tts_from_dir
    self.tts_model.load_checkpoint(config, checkpoint_dir=model_dir, eval=True)
  File "C:\Users\Max\AppData\Local\Programs\Python\Python311\Lib\site-packages\TTS\tts\models\xtts.py", line 645, in load_checkpoint
    self.load_state_dict(load_fsspec(model_path)["model"], strict=strict)
                         ^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Max\AppData\Local\Programs\Python\Python311\Lib\site-packages\TTS\utils\io.py", line 86, in load_fsspec
    return torch.load(f, map_location=map_location, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Max\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch-2.0.1-py3.11-win-amd64.egg\torch\serialization.py", line 809, in load
    return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Max\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch-2.0.1-py3.11-win-amd64.egg\torch\serialization.py", line 1172, in _load
    result = unpickler.load()
             ^^^^^^^^^^^^^^^^
  File "C:\Users\Max\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch-2.0.1-py3.11-win-amd64.egg\torch\serialization.py", line 1142, in persistent_load
    typed_storage = load_tensor(dtype, nbytes, key, _maybe_decode_ascii(location))
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Max\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch-2.0.1-py3.11-win-amd64.egg\torch\serialization.py", line 1116, in load_tensor
    wrap_storage=restore_location(storage, location),
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Max\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch-2.0.1-py3.11-win-amd64.egg\torch\serialization.py", line 217, in default_restore_location
    result = fn(storage, location)
             ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Max\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch-2.0.1-py3.11-win-amd64.egg\torch\serialization.py", line 182, in _cuda_deserialize
    device = validate_cuda_device(location)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Max\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch-2.0.1-py3.11-win-amd64.egg\torch\serialization.py", line 166, in validate_cuda_device
    raise RuntimeError('Attempting to deserialize object on a CUDA '
RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.
PS E:\AI>

Environment

- Coqui TTS is LTS version
- OS windows
- GPU (amd 6700xt, meaning no GPU)
- Ryzen 7 7800x
- pip is installed

Additional context

No response

About this issue

  • Original URL
  • State: closed
  • Created 10 months ago
  • Reactions: 2
  • Comments: 25 (8 by maintainers)

Most upvoted comments

the fix (#2951 ) is on the dev environment, we’ll have to wait the next version release for a publicly available fix. if you want to use the dev version, you can use : pip install git+https://github.com/coqui-ai/TTS.git@dev

I’m encountering the same issue when using the example provided in the Readme.md (Running a multi-speaker and multi-lingual model). I’ve tried a few ways to fix it but none of which I’ve tried works thus far.

import torch
from TTS.api import TTS

# Get device
device = "cuda" if torch.cuda.is_available() else "cpu"

# List available 🐸TTS models and choose the first one
model_name = TTS().list_models()[0]
# Init TTS
tts = TTS(model_name).to(device)

# Run TTS
# ❗ Since this model is multi-speaker and multi-lingual, we must set the target speaker and the language
# Text to speech with a numpy output
wav = tts.tts("This is a test! This is also a test!!", speaker=tts.speakers[0], language=tts.languages[0])
# Text to speech to a file
tts.tts_to_file(text="Hello world!", speaker=tts.speakers[0], language=tts.languages[0], file_path="output.wav")
RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.

[EN] I had an equivalent problem a few hours ago, tested on hugging face with the basic cpu (without gpu). The problem seems to come from the default model used: tts_models/multilingual/multi-dataset/xtts_v1. If you want to use a multi-language model, take the second one on the list: tts_models/multilingual/multi-dataset/your_tts. What’s more, you’ll probably come across another key_error error: for a French text, this template takes fr-fr as the language argument, not fr. Personally, I find the rendering of model 2 less realistic. But perhaps using a French tts_models/fr/mai/tacotron2-DDC or the one that follows on the list, you’ll get a better result, I haven’t tested it. If you’re looking for cleaner rendering, try RVC (RVC-Project/Retrieval-based-Voice-Conversion-WebUI). You can train your voice model with your own samples, coupled with TTS, you can get better results, I’ll let you find out, there are tutorials on it. Good evening from France.

[FR] J’ai eu un problème équivalent il y a quelque heure, tester sur hugging face avec le cpu de base (sans gpu). Le poblème semble venir du modèle utilisé par défaut : tts_models/multilingual/multi-dataset/xtts_v1. Si tu veux utiliser un modèle multi-langue prend le deuxième de la liste, soit : tts_models/multilingual/multi-dataset/your_tts. De plus, tu rencontreras probablement une autre erreur key_error, pour un texte français, ce modèle prend fr-fr comme argument de langue et non fr. Personnellement, je trouve le rendu du modèle 2 moins réaliste. Mais peut-être qu’en utilisant un français tts_models/fr/mai/tacotron2-DDC ou celui qui suit sur la liste, tu obtiendras un meilleur résultat, je ne l’ais pas tester. Si tu cherche un rendu plus propre, essaye de voir du coté de RVC (RVC-Project/Retrieval-based-Voice-Conversion-WebUI). Tu peux entrainer ton modèle de voix avec tes propres échantillions, couplé à TTS, tu peux obtenir des meilleurs résultats, je te laisses te renseigner, des tutos existes dessus. Bonsoir de Paris.

The problem appears even if you use the new --device mps option described in #2875, per #2855.

at “C:\Users\user\AppData\Local\Programs\Python\Python310\lib\site-packages\TTS\utils\io.py” at line 83 and line 86 change torch.load(f, map_location=map_location, **kwargs) to torch.load(f, map_location=torch.device(“cpu”), **kwargs)