piper: No training possible on RTX 4090: CUFFT_INTERNAL_ERROR with torch < 2 (WSL2 & native Ubuntu Linux)

I encountered some problems with training, most of which I could resolve, as I will describe here. I tried it on WSL2 (Ubuntu-20.04) and a ‘real’ Linux Ubuntu-22.04LTS.

The WSL2 guide works well on Linux, also on WSL2, of course, with these additions:

You have to change torchmetrics like this: pip install torchmetrics==0.11.4 as Thorsten already mentioned in his video guide - Thanks Thorsten!

On WSL2, you may also encounter this error: “Error: WSL2 Could not load the library libcudnn_cnn_infer.so.8. Error: libcuda.so: cannot open shared object file: No such file or directory,” which can be solved like this:

sudo ldconfig
cd /usr/lib/wsl/lib/
sudo mv libcuda.so.1 libcuda.so.1.backup
sudo mv libcuda.so libcuda.so.backup
sudo ln -s libcuda.so.1.1 libcuda.so.1
sudo ln -s libcuda.so.1.1 libcuda.so
sudo ldconfig

Also mentioned here github.com/microsoft/WSL/issues/5663

On my old system with a GTX1060 this is already working on GPU (on WSL2 and also native Ubuntu-22.04LTS) On the new system, I only get CPU to work. And of course the GTX1060 still beats a i9-14900k…

With the RTX 4090 it is like this (Same on WSL2 and Ubuntu.22.04LTS):

(.venv) user@ubuntu:~/piper/src/python$ python3 -m piper_train --dataset-dir ~/piper/my-training --accelerator 'gpu' --devices 1 --batch-size 32 --validation-split 0.0 --num-test-examples 0 --max_epochs 10000 --resume_from_checkpoint ~/piper/epoch=2665-step=1182078.ckpt --checkpoint-epochs 1 --precision 32 --quality high
DEBUG:piper_train:Namespace(dataset_dir='/home/user/piper/my-training', checkpoint_epochs=1, quality='high', resume_from_single_speaker_checkpoint=None, logger=True, enable_checkpointing=True, default_root_dir=None, gradient_clip_val=None, gradient_clip_algorithm=None, num_nodes=1, num_processes=None, devices='1', gpus=None, auto_select_gpus=False, tpu_cores=None, ipus=None, enable_progress_bar=True, overfit_batches=0.0, track_grad_norm=-1, check_val_every_n_epoch=1, fast_dev_run=False, accumulate_grad_batches=None, max_epochs=10000, min_epochs=None, max_steps=-1, min_steps=None, max_time=None, limit_train_batches=None, limit_val_batches=None, limit_test_batches=None, limit_predict_batches=None, val_check_interval=None, log_every_n_steps=50, accelerator='gpu', strategy=None, sync_batchnorm=False, precision=32, enable_model_summary=True, weights_save_path=None, num_sanity_val_steps=2, resume_from_checkpoint='/home/user/piper/epoch=2665-step=1182078.ckpt', profiler=None, benchmark=None, deterministic=None, reload_dataloaders_every_n_epochs=0, auto_lr_find=False, replace_sampler_ddp=True, detect_anomaly=False, auto_scale_batch_size=False, plugins=None, amp_backend='native', amp_level=None, move_metrics_to_cpu=False, multiple_trainloader_mode='max_size_cycle', batch_size=32, validation_split=0.0, num_test_examples=0, max_phoneme_ids=None, hidden_channels=192, inter_channels=192, filter_channels=768, n_layers=6, n_heads=2, seed=1234)
/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/checkpoint_connector.py:52: LightningDeprecationWarning: Setting `Trainer(resume_from_checkpoint=)` is deprecated in v1.5 and will be removed in v1.7. Please pass `Trainer.fit(ckpt_path=)` directly instead.
  rank_zero_deprecation(
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
DEBUG:piper_train:Checkpoints will be saved every 1 epoch(s)
DEBUG:vits.dataset:Loading dataset: /home/user/piper/my-training/dataset.jsonl
/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py:731: LightningDeprecationWarning: `trainer.resume_from_checkpoint` is deprecated in v1.5 and will be removed in v2.0. Specify the fit checkpoint path with `trainer.fit(ckpt_path=)` instead.
  ckpt_path = ckpt_path or self.resume_from_checkpoint
Missing logger folder: /home/user/piper/my-training/lightning_logs
Restoring states from the checkpoint path at /home/user/piper/epoch=2665-step=1182078.ckpt
DEBUG:fsspec.local:open file: /home/user/piper/epoch=2665-step=1182078.ckpt
/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/callbacks/model_checkpoint.py:345: UserWarning: The dirpath has changed from '/ssd/piper/out-train/lightning_logs/version_1/checkpoints' to '/home/user/piper/my-training/lightning_logs/version_0/checkpoints', therefore `best_model_score`, `kth_best_model_path`, `kth_value`, `last_model_path` and `best_k_models` won't be reloaded. Only `best_model_path` will be reloaded.
  warnings.warn(
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
DEBUG:fsspec.local:open file: /home/user/piper/my-training/lightning_logs/version_0/hparams.yaml
Restored all states from the checkpoint file at /home/user/piper/epoch=2665-step=1182078.ckpt
/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/utilities/data.py:153: UserWarning: Total length of `DataLoader` across ranks is zero. Please make sure this was your intention.
  rank_zero_warn(
/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:236: PossibleUserWarning: The dataloader, train_dataloader, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 32 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
  rank_zero_warn(
/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py:1892: PossibleUserWarning: The number of training batches (6) is smaller than the logging interval Trainer(log_every_n_steps=50). Set a lower value for log_every_n_steps if you want to see logs for the training epoch.
  rank_zero_warn(
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/user/piper/src/python/piper_train/__main__.py", line 147, in <module>
    main()
  File "/home/user/piper/src/python/piper_train/__main__.py", line 124, in main
    trainer.fit(model)
  File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 696, in fit
    self._call_and_handle_interrupt(
  File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 650, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 735, in _fit_impl
    results = self._run(model, ckpt_path=self.ckpt_path)
  File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1166, in _run
    results = self._run_stage()
  File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1252, in _run_stage
    return self._run_train()
  File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1283, in _run_train
    self.fit_loop.run()
  File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/loop.py", line 200, in run
    self.advance(*args, **kwargs)
  File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/fit_loop.py", line 271, in advance
    self._outputs = self.epoch_loop.run(self._data_fetcher)
  File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/loop.py", line 200, in run
    self.advance(*args, **kwargs)
  File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 203, in advance
    batch_output = self.batch_loop.run(kwargs)
  File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/loop.py", line 200, in run
    self.advance(*args, **kwargs)
  File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/batch/training_batch_loop.py", line 87, in advance
    outputs = self.optimizer_loop.run(optimizers, kwargs)
  File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/loop.py", line 200, in run
    self.advance(*args, **kwargs)
  File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 201, in advance
    result = self._run_optimization(kwargs, self._optimizers[self.optim_progress.optimizer_position])
  File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 248, in _run_optimization
    self._optimizer_step(optimizer, opt_idx, kwargs.get("batch_idx", 0), closure)
  File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 358, in _optimizer_step
    self.trainer._call_lightning_module_hook(
  File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1550, in _call_lightning_module_hook
    output = fn(*args, **kwargs)
  File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/core/module.py", line 1705, in optimizer_step
    optimizer.step(closure=optimizer_closure)
  File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/core/optimizer.py", line 168, in step
    step_output = self._strategy.optimizer_step(self._optimizer, self._optimizer_idx, closure, **kwargs)
  File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/strategies/strategy.py", line 216, in optimizer_step
    return self.precision_plugin.optimizer_step(model, optimizer, opt_idx, closure, **kwargs)
  File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/plugins/precision/precision_plugin.py", line 153, in optimizer_step
    return optimizer.step(closure=closure, **kwargs)
  File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/torch/optim/lr_scheduler.py", line 68, in wrapper
    return wrapped(*args, **kwargs)
  File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/torch/optim/optimizer.py", line 140, in wrapper
    out = func(*args, **kwargs)
  File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/torch/optim/adamw.py", line 120, in step
    loss = closure()
  File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/plugins/precision/precision_plugin.py", line 138, in _wrap_closure
    closure_result = closure()
  File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 146, in __call__
    self._result = self.closure(*args, **kwargs)
  File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 132, in closure
    step_output = self._step_fn()
  File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 407, in _training_step
    training_step_output = self.trainer._call_strategy_hook("training_step", *kwargs.values())
  File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1704, in _call_strategy_hook
    output = fn(*args, **kwargs)
  File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/strategies/strategy.py", line 358, in training_step
    return self.model.training_step(*args, **kwargs)
  File "/home/user/piper/src/python/piper_train/vits/lightning.py", line 191, in training_step
    return self.training_step_g(batch)
  File "/home/user/piper/src/python/piper_train/vits/lightning.py", line 230, in training_step_g
    y_hat_mel = mel_spectrogram_torch(
  File "/home/user/piper/src/python/piper_train/vits/mel_processing.py", line 120, in mel_spectrogram_torch
    torch.stft(
  File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/torch/functional.py", line 632, in stft
    return _VF.stft(input, n_fft, hop_length, win_length, window,  # type: ignore[attr-defined]
RuntimeError: cuFFT error: CUFFT_INTERNAL_ERROR

I did some research, and it seems this issue is caused by a bug in cuda-11.7, as mentioned here github.com/pytorch/pytorch/issues/88038. I also tried the nvidia/pytorch:22.03-py3 docker image, but that also has some support issues with the 4090?!


My question: Are there any workarounds to get an RTX 4090 running or any plans to upgrade to Torch >=2? It’s a pity that I can’t use it for training…


And also thanks for the great work!

About this issue

  • Original URL
  • State: open
  • Created 7 months ago
  • Comments: 20

Most upvoted comments

Hi thank you very much for your great work !

here how i make to work with RTX 4090 and wls2 i use win 10

install developer python

sudo apt-get install python3-dev

Then create a Python virtual environment and activated:

cd piper/src/python
python3 -m venv .venv
source .venv/bin/activate

update pip and wheel setuptools

pip3 install --upgrade pip
pip3 install --upgrade wheel setuptools

install pytorch this version pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

change in requirements.txt

cython>=0.29.0,<1
librosa>=0.9.2,<1
piper-phonemize~=1.1.0
numpy>=1.19.0
onnxruntime>=1.11.0
pytorch-lightning~=1.9.0
onnx

run

pip3 install -e .

build build_monotonic_align

chmod +x build_monotonic_align.sh
build_monotonic_align.sh

i hope this help

hi everyone @qt06 @aaronnewsome @ei23fxg , happy i help on this 😃 like say @ei23fxg need change the version for pytorch-lightning~=1.9.0

@ei23fxg thank you for the tip for speed up

i think be good idea somewhere put this in https://github.com/rhasspy/piper/blob/master/TRAINING.md in train because the rtx 4090 very powerful gpu card and it’s sad you cant use it ,with this amazing repo

happy train to all 😉

I really appreciate you adding more context around the performance of the 4090 ei23fxg. Many, many thanks.

It makes me think there should be some kind of effort started to benchmark and catalog performance so that new users like me can understand what we’re getting into with all this.

It could also be a great place for curious users to see which setups work, what kind of tweaks need to be done and so on. I’m really appreciative of this project and I find it just simply amazing. I’m rather impressed at myself for having the patience to actually get a training done, since I’m not an expert in any of these concepts. I feel like I’ve stumbled upon it way too early since it hasn’t quite progressed to the “anyone can do it” stage.

I’d be willing to help organizing some kind of a benchmarking standard test. If everyone benchmarks the same samples, with the same software versions and settings, it could be very useful to collect those stats and make them browseable.