NeMo: Can't train ASR conformer-transducer

Describe the bug

Hello, guys! I have a problem: I tried to train Conformer-Transducer model. But I stucked at the start. I think it’s because of trainer parameters or some cuda error but I’m note sure…

[NeMo W 2023-10-17 16:18:53 optimizers:54] Apex was not found. Using the lamb or fused_adam optimizer will error out.
[NeMo W 2023-10-17 16:18:58 experimental:27] Module <class 'nemo.collections.asr.modules.audio_modules.SpectrogramToMultichannelFeatures'> is experimental, not ready for production and is not fully supported. Use at your own risk.
device: cuda
[NeMo I 2023-10-17 16:19:03 mixins:170] Tokenizer SentencePieceTokenizer initialized with 1024 tokens
[NeMo W 2023-10-17 16:19:05 modelPT:161] If you intend to do training or fine-tuning, please call the ModelPT.setup_training_data() method and provide a valid configuration file to setup the train data loader.
    Train config : 
    manifest_filepath: null
    sample_rate: 16000
    batch_size: 16
    shuffle: true
    num_workers: 8
    pin_memory: true
    use_start_end_token: false
    trim_silence: false
    max_duration: 20.0
    min_duration: 0.1
    is_tarred: false
    tarred_audio_filepaths: null
    shuffle_n: 2048
    bucketing_strategy: synced_randomized
    bucketing_batch_size: null
    bucketing_weights: ''
    
[NeMo W 2023-10-17 16:19:05 modelPT:168] If you intend to do validation, please call the ModelPT.setup_validation_data() or ModelPT.setup_multiple_validation_data() method and provide a valid configuration file to setup the validation data loader(s). 
    Validation config : 
    manifest_filepath: null
    sample_rate: 16000
    batch_size: 16
    shuffle: false
    num_workers: 8
    pin_memory: true
    use_start_end_token: false
    
[NeMo W 2023-10-17 16:19:05 modelPT:174] Please call the ModelPT.setup_test_data() or ModelPT.setup_multiple_test_data() method and provide a valid configuration file to setup the test data loader(s).
    Test config : 
    manifest_filepath: null
    sample_rate: 16000
    batch_size: 16
    shuffle: false
    num_workers: 8
    pin_memory: true
    use_start_end_token: false
    
[NeMo I 2023-10-17 16:19:05 features:287] PADDING: 0
[NeMo W 2023-10-17 16:19:07 nemo_logging:349] /root/sdb/nemo/stt_conformer/new_venv/lib/python3.8/site-packages/torch/nn/modules/rnn.py:67: UserWarning: dropout option adds dropout after all but last recurrent layer, so non-zero dropout expects num_layers greater than 1, but got dropout=0.1 and num_layers=1
      warnings.warn("dropout option adds dropout after all but last "
    
[NeMo I 2023-10-17 16:19:07 rnnt_models:206] Using RNNT Loss : warprnnt_numba
    Loss warprnnt_numba_kwargs: {'fastemit_lambda': 0.0, 'clamp': -1.0}
[NeMo I 2023-10-17 16:19:16 save_restore_connector:249] Model EncDecRNNTBPEModel was successfully restored from /root/.cache/huggingface/hub/models--nvidia--stt_ru_conformer_transducer_large/snapshots/687d02db291e931455cf321abd625ef2b7f0b1a9/stt_ru_conformer_transducer_large.nemo.
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
[NeMo I 2023-10-17 16:19:17 collections:193] Dataset loaded with 26757 files totalling 31.35 hours
[NeMo I 2023-10-17 16:19:17 collections:194] 0 files were filtered totalling 0.00 hours
[NeMo I 2023-10-17 16:19:19 collections:193] Dataset loaded with 7135 files totalling 8.24 hours
[NeMo I 2023-10-17 16:19:19 collections:194] 0 files were filtered totalling 0.00 hours
[NeMo W 2023-10-17 16:19:19 audio_to_text_dataset:675] Could not load dataset as `manifest_filepath` was None. Provided config : {'manifest_filepath': None, 'sample_rate': 16000, 'batch_size': 16, 'shuffle': False, 'num_workers': 8, 'pin_memory': True, 'use_start_end_token': False}
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
[NeMo I 2023-10-17 16:19:20 modelPT:722] Optimizer config = Novograd (
    Parameter Group 0
        amsgrad: False
        betas: [0.9, 0.98]
        eps: 1e-08
        grad_averaging: False
        lr: 0.0001
        weight_decay: 0.001
    )
[NeMo I 2023-10-17 16:19:20 lr_scheduler:910] Scheduler "<nemo.core.optim.lr_scheduler.CosineAnnealing object at 0x7fa5600805b0>" 
    will be used during training (effective maximum steps = 22400) - 
    Parameters : 
    (warmup_steps: 10000
    warmup_ratio: null
    min_lr: 1.0e-06
    max_steps: 22400
    )

  | Name              | Type                              | Params
------------------------------------------------------------------------
0 | preprocessor      | AudioToMelSpectrogramPreprocessor | 0     
1 | encoder           | ConformerEncoder                  | 115 M 
2 | decoder           | RNNTDecoder                       | 3.9 M 
3 | joint             | RNNTJoint                         | 1.4 M 
4 | loss              | RNNTLoss                          | 0     
5 | spec_augmentation | SpectrogramAugmentation           | 0     
6 | wer               | RNNTBPEWER                        | 0     
------------------------------------------------------------------------
5.4 M     Trainable params
115 M     Non-trainable params
120 M     Total params
481.780   Total estimated model params size (MB)
Epoch 0:   0%|                                                                                                                                                                            | 0/1130 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/root/sdb/nemo/stt_conformer/src/models/train_model.py", line 156, in <module>
    main()
  File "/root/sdb/nemo/stt_conformer/new_venv/lib/python3.8/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/root/sdb/nemo/stt_conformer/new_venv/lib/python3.8/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/root/sdb/nemo/stt_conformer/new_venv/lib/python3.8/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/root/sdb/nemo/stt_conformer/new_venv/lib/python3.8/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/root/sdb/nemo/stt_conformer/src/models/train_model.py", line 150, in main
    trainer.fit(model)
  File "/root/sdb/nemo/stt_conformer/new_venv/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 603, in fit
    call._call_and_handle_interrupt(
  File "/root/sdb/nemo/stt_conformer/new_venv/lib/python3.8/site-packages/pytorch_lightning/trainer/call.py", line 38, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "/root/sdb/nemo/stt_conformer/new_venv/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 645, in _fit_impl
    self._run(model, ckpt_path=self.ckpt_path)
  File "/root/sdb/nemo/stt_conformer/new_venv/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1098, in _run
    results = self._run_stage()
  File "/root/sdb/nemo/stt_conformer/new_venv/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1177, in _run_stage
    self._run_train()
  File "/root/sdb/nemo/stt_conformer/new_venv/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1200, in _run_train
    self.fit_loop.run()
  File "/root/sdb/nemo/stt_conformer/new_venv/lib/python3.8/site-packages/pytorch_lightning/loops/loop.py", line 199, in run
    self.advance(*args, **kwargs)
  File "/root/sdb/nemo/stt_conformer/new_venv/lib/python3.8/site-packages/pytorch_lightning/loops/fit_loop.py", line 267, in advance
    self._outputs = self.epoch_loop.run(self._data_fetcher)
  File "/root/sdb/nemo/stt_conformer/new_venv/lib/python3.8/site-packages/pytorch_lightning/loops/loop.py", line 199, in run
    self.advance(*args, **kwargs)
  File "/root/sdb/nemo/stt_conformer/new_venv/lib/python3.8/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 214, in advance
    batch_output = self.batch_loop.run(kwargs)
  File "/root/sdb/nemo/stt_conformer/new_venv/lib/python3.8/site-packages/pytorch_lightning/loops/loop.py", line 199, in run
    self.advance(*args, **kwargs)
  File "/root/sdb/nemo/stt_conformer/new_venv/lib/python3.8/site-packages/pytorch_lightning/loops/batch/training_batch_loop.py", line 88, in advance
    outputs = self.optimizer_loop.run(optimizers, kwargs)
  File "/root/sdb/nemo/stt_conformer/new_venv/lib/python3.8/site-packages/pytorch_lightning/loops/loop.py", line 199, in run
    self.advance(*args, **kwargs)
  File "/root/sdb/nemo/stt_conformer/new_venv/lib/python3.8/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 200, in advance
    result = self._run_optimization(kwargs, self._optimizers[self.optim_progress.optimizer_position])
  File "/root/sdb/nemo/stt_conformer/new_venv/lib/python3.8/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 239, in _run_optimization
    closure()
  File "/root/sdb/nemo/stt_conformer/new_venv/lib/python3.8/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 147, in __call__
    self._result = self.closure(*args, **kwargs)
  File "/root/sdb/nemo/stt_conformer/new_venv/lib/python3.8/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 133, in closure
    step_output = self._step_fn()
  File "/root/sdb/nemo/stt_conformer/new_venv/lib/python3.8/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 406, in _training_step
    training_step_output = self.trainer._call_strategy_hook("training_step", *kwargs.values())
  File "/root/sdb/nemo/stt_conformer/new_venv/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1480, in _call_strategy_hook
    output = fn(*args, **kwargs)
  File "/root/sdb/nemo/stt_conformer/new_venv/lib/python3.8/site-packages/pytorch_lightning/strategies/strategy.py", line 378, in training_step
    return self.model.training_step(*args, **kwargs)
  File "/root/sdb/nemo/stt_conformer/new_venv/lib/python3.8/site-packages/nemo/utils/model_utils.py", line 380, in wrap_training_step
    output_dict = wrapped(*args, **kwargs)
  File "/root/sdb/nemo/stt_conformer/new_venv/lib/python3.8/site-packages/nemo/collections/asr/models/rnnt_models.py", line 712, in training_step
    loss_value, wer, _, _ = self.joint(
  File "/root/sdb/nemo/stt_conformer/new_venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/root/sdb/nemo/stt_conformer/new_venv/lib/python3.8/site-packages/nemo/core/classes/common.py", line 1087, in __call__
    outputs = wrapped(*args, **kwargs)
  File "/root/sdb/nemo/stt_conformer/new_venv/lib/python3.8/site-packages/nemo/collections/asr/modules/rnnt.py", line 1335, in forward
    loss_batch = self.loss(
  File "/root/sdb/nemo/stt_conformer/new_venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/root/sdb/nemo/stt_conformer/new_venv/lib/python3.8/site-packages/nemo/core/classes/common.py", line 1087, in __call__
    outputs = wrapped(*args, **kwargs)
  File "/root/sdb/nemo/stt_conformer/new_venv/lib/python3.8/site-packages/nemo/collections/asr/losses/rnnt.py", line 361, in forward
    loss = self._loss(acts=log_probs, labels=targets, act_lens=input_lengths, label_lens=target_lengths)
  File "/root/sdb/nemo/stt_conformer/new_venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/root/sdb/nemo/stt_conformer/new_venv/lib/python3.8/site-packages/nemo/collections/asr/parts/numba/rnnt_loss/rnnt_pytorch.py", line 281, in forward
    return self.loss(
  File "/root/sdb/nemo/stt_conformer/new_venv/lib/python3.8/site-packages/nemo/collections/asr/parts/numba/rnnt_loss/rnnt_pytorch.py", line 62, in forward
    loss_func(
  File "/root/sdb/nemo/stt_conformer/new_venv/lib/python3.8/site-packages/nemo/collections/asr/parts/numba/rnnt_loss/rnnt.py", line 223, in rnnt_loss_gpu
    status = wrapper.cost_and_grad(
  File "/root/sdb/nemo/stt_conformer/new_venv/lib/python3.8/site-packages/nemo/collections/asr/parts/numba/rnnt_loss/utils/cuda_utils/gpu_rnnt.py", line 249, in cost_and_grad
    return self.compute_cost_and_score(acts, grads, costs, pad_labels, label_lengths, input_lengths)
  File "/root/sdb/nemo/stt_conformer/new_venv/lib/python3.8/site-packages/nemo/collections/asr/parts/numba/rnnt_loss/utils/cuda_utils/gpu_rnnt.py", line 158, in compute_cost_and_score
    self.log_softmax(acts, denom)
  File "/root/sdb/nemo/stt_conformer/new_venv/lib/python3.8/site-packages/nemo/collections/asr/parts/numba/rnnt_loss/utils/cuda_utils/gpu_rnnt.py", line 104, in log_softmax
    reduce.reduce_max(
  File "/root/sdb/nemo/stt_conformer/new_venv/lib/python3.8/site-packages/nemo/collections/asr/parts/numba/rnnt_loss/utils/cuda_utils/reduce.py", line 353, in reduce_max
    return ReduceHelper(
  File "/root/sdb/nemo/stt_conformer/new_venv/lib/python3.8/site-packages/nemo/collections/asr/parts/numba/rnnt_loss/utils/cuda_utils/reduce.py", line 294, in ReduceHelper
    _reduce_rows[grid_size, CTA_REDUCE_SIZE, stream, 0](I_opid, R_opid, acts, output, num_rows)
  File "/root/sdb/nemo/stt_conformer/new_venv/lib/python3.8/site-packages/numba/cuda/dispatcher.py", line 542, in __call__
    return self.dispatcher.call(args, self.griddim, self.blockdim,
  File "/root/sdb/nemo/stt_conformer/new_venv/lib/python3.8/site-packages/numba/cuda/dispatcher.py", line 676, in call
    kernel = _dispatcher.Dispatcher._cuda_call(self, *args)
  File "/root/sdb/nemo/stt_conformer/new_venv/lib/python3.8/site-packages/numba/cuda/dispatcher.py", line 684, in _compile_for_args
    return self.compile(tuple(argtypes))
  File "/root/sdb/nemo/stt_conformer/new_venv/lib/python3.8/site-packages/numba/cuda/dispatcher.py", line 927, in compile
    kernel = _Kernel(self.py_func, argtypes, **self.targetoptions)
  File "/root/sdb/nemo/stt_conformer/new_venv/lib/python3.8/site-packages/numba/core/compiler_lock.py", line 35, in _acquire_compile_lock
    return func(*args, **kwargs)
  File "/root/sdb/nemo/stt_conformer/new_venv/lib/python3.8/site-packages/numba/cuda/dispatcher.py", line 84, in __init__
    cres = compile_cuda(self.py_func, types.void, self.argtypes,
  File "/root/sdb/nemo/stt_conformer/new_venv/lib/python3.8/site-packages/numba/core/compiler_lock.py", line 35, in _acquire_compile_lock
    return func(*args, **kwargs)
  File "/root/sdb/nemo/stt_conformer/new_venv/lib/python3.8/site-packages/numba/cuda/compiler.py", line 230, in compile_cuda
    cres = compiler.compile_extra(typingctx=typingctx,
  File "/root/sdb/nemo/stt_conformer/new_venv/lib/python3.8/site-packages/numba/core/compiler.py", line 742, in compile_extra
    return pipeline.compile_extra(func)
  File "/root/sdb/nemo/stt_conformer/new_venv/lib/python3.8/site-packages/numba/core/compiler.py", line 460, in compile_extra
    return self._compile_bytecode()
  File "/root/sdb/nemo/stt_conformer/new_venv/lib/python3.8/site-packages/numba/core/compiler.py", line 528, in _compile_bytecode
    return self._compile_core()
  File "/root/sdb/nemo/stt_conformer/new_venv/lib/python3.8/site-packages/numba/core/compiler.py", line 507, in _compile_core
    raise e
  File "/root/sdb/nemo/stt_conformer/new_venv/lib/python3.8/site-packages/numba/core/compiler.py", line 494, in _compile_core
    pm.run(self.state)
  File "/root/sdb/nemo/stt_conformer/new_venv/lib/python3.8/site-packages/numba/core/compiler_machinery.py", line 368, in run
    raise patched_exception
  File "/root/sdb/nemo/stt_conformer/new_venv/lib/python3.8/site-packages/numba/core/compiler_machinery.py", line 356, in run
    self._runPass(idx, pass_inst, state)
  File "/root/sdb/nemo/stt_conformer/new_venv/lib/python3.8/site-packages/numba/core/compiler_lock.py", line 35, in _acquire_compile_lock
    return func(*args, **kwargs)
  File "/root/sdb/nemo/stt_conformer/new_venv/lib/python3.8/site-packages/numba/core/compiler_machinery.py", line 311, in _runPass
    mutated |= check(pss.run_pass, internal_state)
  File "/root/sdb/nemo/stt_conformer/new_venv/lib/python3.8/site-packages/numba/core/compiler_machinery.py", line 273, in check
    mangled = func(compiler_state)
  File "/root/sdb/nemo/stt_conformer/new_venv/lib/python3.8/site-packages/numba/core/typed_passes.py", line 110, in run_pass
    typemap, return_type, calltypes, errs = type_inference_stage(
  File "/root/sdb/nemo/stt_conformer/new_venv/lib/python3.8/site-packages/numba/core/typed_passes.py", line 88, in type_inference_stage
    errs = infer.propagate(raise_errors=raise_errors)
  File "/root/sdb/nemo/stt_conformer/new_venv/lib/python3.8/site-packages/numba/core/typeinfer.py", line 1086, in propagate
    raise errors[0]
numba.core.errors.TypingError: Failed in cuda mode pipeline (step: nopython frontend)
Internal error at <numba.core.typeinfer.CallConstraint object at 0x7fa59609cbb0>.
libNVVM cannot be found. Do `conda install cudatoolkit`:
[Errno 2] No such file or directory: '/usr/local/cuda/nvvm/lib64'
During: resolving callee type: type(CUDADispatcher(<function exponential at 0x7fa56b25c430>))
During: typing of call at /root/sdb/nemo/stt_conformer/new_venv/lib/python3.8/site-packages/nemo/collections/asr/parts/numba/rnnt_loss/utils/cuda_utils/reduce.py (158)

Enable logging at debug level for details.

File "new_venv/lib/python3.8/site-packages/nemo/collections/asr/parts/numba/rnnt_loss/utils/cuda_utils/reduce.py", line 158:
def _reduce_rows(I_opid: int, R_opid: int, acts, output, num_rows: int):
    <source elided>
        if I_opid == 0:
            curr = rnnt_helper.exponential(curr)
            ^

Epoch 0:   0%|          | 0/1130 [00:11<?, ?it/s]

I install environment using pip install -r requirements.txt. requirements contains the following:

pandas==1.5.3
torch==1.13.0
pytorch_lightning==1.8.6
omegaconf==2.2.3
nemo_toolkit==1.18.1
optuna==3.1.0
pyctcdecode==0.5.0
swifter==1.3.4
openpyxl==3.1.2
torchvision==0.14.0
torchmetrics==0.11.4
torchaudio==0.13.0
nemo-text-processing==0.1.7rc0
jiwer==3.0.1
hydra-core==1.3.2
librosa==0.10.1
sentencepiece==0.1.99
youtokentome==1.0.6
braceexpand==0.1.7
webdataset==0.1.62
pyannote.core==5.0.0
pyannote.database==5.0.1
pyannote.metrics==3.2.1
editdistance==0.6.2

Here is nvidia-smi output:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17   Driver Version: 525.105.17   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GRID V100DX-32Q     On   | 00000000:02:00.0 Off |                    0 |
| N/A   N/A    P0    N/A /  N/A |      0MiB / 32768MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+

OS: Ubuntu 18.04.6 LTS Python version: Python 3.8.16

What really surprised me that if I train Conformer model, it is ok.

About this issue

Original URL
State: closed
Created 8 months ago
Comments: 34 (2 by maintainers)

Most upvoted comments

Quite odd, cudatoolkit is part and parcel of pytorch. Can you try doing conda install cudatoolkit=11.8 and see if it solves this problem ?

titu1994 on Oct 24, 2023

You can train it, but it’s harder. Youll need aggressive tuning grid search to see how to minmax results on small data. Or you can use adapters if you’re not changing the tokenizer of the model.

titu1994 on Oct 25, 2023

Rnnt loss and joint computation is super expensive on memory, so we disable eval loss calculation for rnnt, instead using val_wer as metric.

We still have the flag “compute_eval_loss” to compute eval loss if you really need it but it wastes a lot of memory

titu1994 on Oct 24, 2023

Oh that is perfectly fine, do not worry about the performance warning.

The cuda kernel for rnnt is designed in a way that its optimal at large batch sizes but rnnt at large batch sizes will exhaust memory. It doesn’t matter much, the cuda kernel is still 300x faster than a hand written pytorch loop with autograd and it computed the loss in 20-50 ms per step anyway.

It’s finally working !

titu1994 on Oct 24, 2023

Installation was fine. Now it’s NeMo time 👍

------------------------- Conda problems ------------------------- Numba installation you are here ------> NeMo installation ------------------------- Tests ------------------------- Successful ASR model training

TonSotin on Oct 24, 2023

Your numba, cuda and pytorch install seems botched somehow. Id start from a fresh conda conda environment, stick to installing everything using conda only and maybe that will work

titu1994 on Oct 23, 2023