piper: No training possible on RTX 4090: CUFFT_INTERNAL_ERROR with torch < 2 (WSL2 & native Ubuntu Linux)
I encountered some problems with training, most of which I could resolve, as I will describe here. I tried it on WSL2 (Ubuntu-20.04) and a ‘real’ Linux Ubuntu-22.04LTS.
The WSL2 guide works well on Linux, also on WSL2, of course, with these additions:
You have to change torchmetrics like this:
pip install torchmetrics==0.11.4
as Thorsten already mentioned in his video guide - Thanks Thorsten!
On WSL2, you may also encounter this error: “Error: WSL2 Could not load the library libcudnn_cnn_infer.so.8. Error: libcuda.so: cannot open shared object file: No such file or directory,” which can be solved like this:
sudo ldconfig
cd /usr/lib/wsl/lib/
sudo mv libcuda.so.1 libcuda.so.1.backup
sudo mv libcuda.so libcuda.so.backup
sudo ln -s libcuda.so.1.1 libcuda.so.1
sudo ln -s libcuda.so.1.1 libcuda.so
sudo ldconfig
Also mentioned here github.com/microsoft/WSL/issues/5663
On my old system with a GTX1060 this is already working on GPU (on WSL2 and also native Ubuntu-22.04LTS) On the new system, I only get CPU to work. And of course the GTX1060 still beats a i9-14900k…
With the RTX 4090 it is like this (Same on WSL2 and Ubuntu.22.04LTS):
(.venv) user@ubuntu:~/piper/src/python$ python3 -m piper_train --dataset-dir ~/piper/my-training --accelerator 'gpu' --devices 1 --batch-size 32 --validation-split 0.0 --num-test-examples 0 --max_epochs 10000 --resume_from_checkpoint ~/piper/epoch=2665-step=1182078.ckpt --checkpoint-epochs 1 --precision 32 --quality high
DEBUG:piper_train:Namespace(dataset_dir='/home/user/piper/my-training', checkpoint_epochs=1, quality='high', resume_from_single_speaker_checkpoint=None, logger=True, enable_checkpointing=True, default_root_dir=None, gradient_clip_val=None, gradient_clip_algorithm=None, num_nodes=1, num_processes=None, devices='1', gpus=None, auto_select_gpus=False, tpu_cores=None, ipus=None, enable_progress_bar=True, overfit_batches=0.0, track_grad_norm=-1, check_val_every_n_epoch=1, fast_dev_run=False, accumulate_grad_batches=None, max_epochs=10000, min_epochs=None, max_steps=-1, min_steps=None, max_time=None, limit_train_batches=None, limit_val_batches=None, limit_test_batches=None, limit_predict_batches=None, val_check_interval=None, log_every_n_steps=50, accelerator='gpu', strategy=None, sync_batchnorm=False, precision=32, enable_model_summary=True, weights_save_path=None, num_sanity_val_steps=2, resume_from_checkpoint='/home/user/piper/epoch=2665-step=1182078.ckpt', profiler=None, benchmark=None, deterministic=None, reload_dataloaders_every_n_epochs=0, auto_lr_find=False, replace_sampler_ddp=True, detect_anomaly=False, auto_scale_batch_size=False, plugins=None, amp_backend='native', amp_level=None, move_metrics_to_cpu=False, multiple_trainloader_mode='max_size_cycle', batch_size=32, validation_split=0.0, num_test_examples=0, max_phoneme_ids=None, hidden_channels=192, inter_channels=192, filter_channels=768, n_layers=6, n_heads=2, seed=1234)
/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/checkpoint_connector.py:52: LightningDeprecationWarning: Setting `Trainer(resume_from_checkpoint=)` is deprecated in v1.5 and will be removed in v1.7. Please pass `Trainer.fit(ckpt_path=)` directly instead.
rank_zero_deprecation(
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
DEBUG:piper_train:Checkpoints will be saved every 1 epoch(s)
DEBUG:vits.dataset:Loading dataset: /home/user/piper/my-training/dataset.jsonl
/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py:731: LightningDeprecationWarning: `trainer.resume_from_checkpoint` is deprecated in v1.5 and will be removed in v2.0. Specify the fit checkpoint path with `trainer.fit(ckpt_path=)` instead.
ckpt_path = ckpt_path or self.resume_from_checkpoint
Missing logger folder: /home/user/piper/my-training/lightning_logs
Restoring states from the checkpoint path at /home/user/piper/epoch=2665-step=1182078.ckpt
DEBUG:fsspec.local:open file: /home/user/piper/epoch=2665-step=1182078.ckpt
/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/callbacks/model_checkpoint.py:345: UserWarning: The dirpath has changed from '/ssd/piper/out-train/lightning_logs/version_1/checkpoints' to '/home/user/piper/my-training/lightning_logs/version_0/checkpoints', therefore `best_model_score`, `kth_best_model_path`, `kth_value`, `last_model_path` and `best_k_models` won't be reloaded. Only `best_model_path` will be reloaded.
warnings.warn(
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
DEBUG:fsspec.local:open file: /home/user/piper/my-training/lightning_logs/version_0/hparams.yaml
Restored all states from the checkpoint file at /home/user/piper/epoch=2665-step=1182078.ckpt
/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/utilities/data.py:153: UserWarning: Total length of `DataLoader` across ranks is zero. Please make sure this was your intention.
rank_zero_warn(
/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:236: PossibleUserWarning: The dataloader, train_dataloader, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 32 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
rank_zero_warn(
/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py:1892: PossibleUserWarning: The number of training batches (6) is smaller than the logging interval Trainer(log_every_n_steps=50). Set a lower value for log_every_n_steps if you want to see logs for the training epoch.
rank_zero_warn(
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/user/piper/src/python/piper_train/__main__.py", line 147, in <module>
main()
File "/home/user/piper/src/python/piper_train/__main__.py", line 124, in main
trainer.fit(model)
File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 696, in fit
self._call_and_handle_interrupt(
File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 650, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 735, in _fit_impl
results = self._run(model, ckpt_path=self.ckpt_path)
File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1166, in _run
results = self._run_stage()
File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1252, in _run_stage
return self._run_train()
File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1283, in _run_train
self.fit_loop.run()
File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/loop.py", line 200, in run
self.advance(*args, **kwargs)
File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/fit_loop.py", line 271, in advance
self._outputs = self.epoch_loop.run(self._data_fetcher)
File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/loop.py", line 200, in run
self.advance(*args, **kwargs)
File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 203, in advance
batch_output = self.batch_loop.run(kwargs)
File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/loop.py", line 200, in run
self.advance(*args, **kwargs)
File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/batch/training_batch_loop.py", line 87, in advance
outputs = self.optimizer_loop.run(optimizers, kwargs)
File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/loop.py", line 200, in run
self.advance(*args, **kwargs)
File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 201, in advance
result = self._run_optimization(kwargs, self._optimizers[self.optim_progress.optimizer_position])
File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 248, in _run_optimization
self._optimizer_step(optimizer, opt_idx, kwargs.get("batch_idx", 0), closure)
File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 358, in _optimizer_step
self.trainer._call_lightning_module_hook(
File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1550, in _call_lightning_module_hook
output = fn(*args, **kwargs)
File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/core/module.py", line 1705, in optimizer_step
optimizer.step(closure=optimizer_closure)
File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/core/optimizer.py", line 168, in step
step_output = self._strategy.optimizer_step(self._optimizer, self._optimizer_idx, closure, **kwargs)
File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/strategies/strategy.py", line 216, in optimizer_step
return self.precision_plugin.optimizer_step(model, optimizer, opt_idx, closure, **kwargs)
File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/plugins/precision/precision_plugin.py", line 153, in optimizer_step
return optimizer.step(closure=closure, **kwargs)
File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/torch/optim/lr_scheduler.py", line 68, in wrapper
return wrapped(*args, **kwargs)
File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/torch/optim/optimizer.py", line 140, in wrapper
out = func(*args, **kwargs)
File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/torch/optim/adamw.py", line 120, in step
loss = closure()
File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/plugins/precision/precision_plugin.py", line 138, in _wrap_closure
closure_result = closure()
File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 146, in __call__
self._result = self.closure(*args, **kwargs)
File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 132, in closure
step_output = self._step_fn()
File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 407, in _training_step
training_step_output = self.trainer._call_strategy_hook("training_step", *kwargs.values())
File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1704, in _call_strategy_hook
output = fn(*args, **kwargs)
File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/pytorch_lightning/strategies/strategy.py", line 358, in training_step
return self.model.training_step(*args, **kwargs)
File "/home/user/piper/src/python/piper_train/vits/lightning.py", line 191, in training_step
return self.training_step_g(batch)
File "/home/user/piper/src/python/piper_train/vits/lightning.py", line 230, in training_step_g
y_hat_mel = mel_spectrogram_torch(
File "/home/user/piper/src/python/piper_train/vits/mel_processing.py", line 120, in mel_spectrogram_torch
torch.stft(
File "/home/user/piper/src/python/.venv/lib/python3.10/site-packages/torch/functional.py", line 632, in stft
return _VF.stft(input, n_fft, hop_length, win_length, window, # type: ignore[attr-defined]
RuntimeError: cuFFT error: CUFFT_INTERNAL_ERROR
I did some research, and it seems this issue is caused by a bug in cuda-11.7, as mentioned here github.com/pytorch/pytorch/issues/88038. I also tried the nvidia/pytorch:22.03-py3 docker image, but that also has some support issues with the 4090?!
My question: Are there any workarounds to get an RTX 4090 running or any plans to upgrade to Torch >=2? It’s a pity that I can’t use it for training…
And also thanks for the great work!
About this issue
- Original URL
- State: open
- Created 7 months ago
- Comments: 20
Hi thank you very much for your great work !
here how i make to work with RTX 4090 and wls2 i use win 10
install developer python
sudo apt-get install python3-devThen create a Python virtual environment and activated:
update pip and wheel setuptools
install pytorch this version
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118change in requirements.txt
run
pip3 install -e .build build_monotonic_align
i hope this help
hi everyone @qt06 @aaronnewsome @ei23fxg , happy i help on this 😃 like say @ei23fxg need change the version for
pytorch-lightning~=1.9.0@ei23fxg thank you for the tip for speed up
i think be good idea somewhere put this in https://github.com/rhasspy/piper/blob/master/TRAINING.md in train because the rtx 4090 very powerful gpu card and it’s sad you cant use it ,with this amazing repo
happy train to all 😉
I really appreciate you adding more context around the performance of the 4090 ei23fxg. Many, many thanks.
It makes me think there should be some kind of effort started to benchmark and catalog performance so that new users like me can understand what we’re getting into with all this.
It could also be a great place for curious users to see which setups work, what kind of tweaks need to be done and so on. I’m really appreciative of this project and I find it just simply amazing. I’m rather impressed at myself for having the patience to actually get a training done, since I’m not an expert in any of these concepts. I feel like I’ve stumbled upon it way too early since it hasn’t quite progressed to the “anyone can do it” stage.
I’d be willing to help organizing some kind of a benchmarking standard test. If everyone benchmarks the same samples, with the same software versions and settings, it could be very useful to collect those stats and make them browseable.