fairseq: RuntimeError: "LayerNormKernelImpl" not implemented for 'Half'

🐛 Bug

After following the Roberta pretraining documentation with the exact same steps and data. When running fairseq-train I get an error

To Reproduce

TOTAL_UPDATES=125000    # Total number of training steps
WARMUP_UPDATES=10000    # Warmup the learning rate over this many updates
PEAK_LR=0.0005          # Peak learning rate, adjust as needed
TOKENS_PER_SAMPLE=512   # Max sequence length
MAX_POSITIONS=512       # Num. positional embeddings (usually same as above)
MAX_SENTENCES=16        # Number of sequences per batch (batch size)
UPDATE_FREQ=16          # Increase the batch size 16x

DATA_DIR=data-bin/wikitext-103

fairseq-train --fp16 $DATA_DIR \
    --task masked_lm --criterion masked_lm \
    --arch roberta_base --sample-break-mode complete --tokens-per-sample $TOKENS_PER_SAMPLE \
    --optimizer adam --adam-betas '(0.9,0.98)' --adam-eps 1e-6 --clip-norm 0.0 \
    --lr-scheduler polynomial_decay --lr $PEAK_LR --warmup-updates $WARMUP_UPDATES --total-num-update $TOTAL_UPDATES \
    --dropout 0.1 --attention-dropout 0.1 --weight-decay 0.01 \
    --max-sentences $MAX_SENTENCES --update-freq $UPDATE_FREQ \
    --max-update $TOTAL_UPDATES --log-format simple --log-interval 1

Then I get the following stack trace:


2020-08-03 09:13:25 | INFO | fairseq_cli.train | begin training epoch 1
Traceback (most recent call last):
  File "/usr/local/bin/fairseq-train", line 33, in <module>
    sys.exit(load_entry_point('fairseq', 'console_scripts', 'fairseq-train')())
  File "/fairseq/fairseq_cli/train.py", line 350, in cli_main
    distributed_utils.call_main(args, main)
  File "/fairseq/fairseq/distributed_utils.py", line 189, in call_main
    main(args, **kwargs)
  File "/fairseq/fairseq_cli/train.py", line 121, in main
    valid_losses, should_stop = train(args, trainer, task, epoch_itr)
  File "/usr/lib/python3.6/contextlib.py", line 52, in inner
    return func(*args, **kwds)
  File "/fairseq/fairseq_cli/train.py", line 217, in train
    log_output = trainer.train_step(samples)
  File "/usr/lib/python3.6/contextlib.py", line 52, in inner
    return func(*args, **kwds)
  File "/fairseq/fairseq/trainer.py", line 457, in train_step
    raise e
  File "/fairseq/fairseq/trainer.py", line 431, in train_step
    ignore_grad=is_dummy_batch,
  File "/fairseq/fairseq/tasks/fairseq_task.py", line 347, in train_step
    loss, sample_size, logging_output = criterion(model, sample)
  File "/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/fairseq/fairseq/criterions/masked_lm.py", line 52, in forward
    logits = model(**sample['net_input'], masked_tokens=masked_tokens)[0]
  File "/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/fairseq/fairseq/models/roberta/model.py", line 119, in forward
    x, extra = self.encoder(src_tokens, features_only, return_all_hiddens, **kwargs)
  File "/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/fairseq/fairseq/models/roberta/model.py", line 337, in forward
    x, extra = self.extract_features(src_tokens, return_all_hiddens=return_all_hiddens)
  File "/fairseq/fairseq/models/roberta/model.py", line 345, in extract_features
    last_state_only=not return_all_hiddens,
  File "/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/fairseq/fairseq/modules/transformer_sentence_encoder.py", line 250, in forward
    x = self.emb_layer_norm(x)
  File "/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/.local/lib/python3.6/site-packages/torch/nn/modules/normalization.py", line 170, in forward
    input, self.normalized_shape, self.weight, self.bias, self.eps)
  File "/.local/lib/python3.6/site-packages/torch/nn/functional.py", line 2049, in layer_norm
    torch.backends.cudnn.enabled)
RuntimeError: "LayerNormKernelImpl" not implemented for 'Half'

Environment

  • fairseq Version : master
  • PyTorch Version : 1.6.0
  • OS : Linux (Ubuntu)
  • How you installed fairseq (pip, source): source
  • Build command you used (if compiling from source):
git clone https://github.com/pytorch/fairseq
cd fairseq
pip install --editable .
  • Python version: 3.6.9
  • CUDA/cuDNN version: 11.0

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 33

Most upvoted comments

I get the same exception even when running on cpu. Did you find a solution?

try using these arguments in the webui-user.bat if you are running on CPU: set COMMANDLINE_ARGS=--skip-torch-cuda-test --precision full --no-half

Try setting env variable NVIDIA_VISIBLE_DEVICES=‘all’ or NVIDIA_VISIBLE_DEVICES=all

this worked for me, I am running code through docker

MacBook Pro vim webui-macos-env.sh

export COMMANDLINE_ARGS=“–skip-torch-cuda-test --upcast-sampling --no-half-vae --use-cpu interrogate --precision full --no-half”

is OK

running like this: sh webui.sh --skip-torch-cuda-test --no-half --use-cpu all

TORCH_COMMAND='pip install torch torchvision --extra-index-url https://download.pytorch.org/whl/rocm5.1.1' python launch.py --precision full --no-half try?

I think that I found solution. In fact, what we need to do is to remove the parameter: --fp16. Then, the whole command line will look like: python3.7 train.py Quora/data-bin --restore-file bart.large/model.pt --max-tokens 1024 --task translation --source-lang source --target-lang target --truncate-source --share-decoder-input-output-embed --reset-optimizer --reset-dataloader --reset-meters --required-batch-size-multiple 1 --arch bart_large --criterion label_smoothed_cross_entropy --label-smoothing 0.1 --dropout 0.1 --attention-dropout 0.1 --weight-decay 0.01 --optimizer adam --adam-betas “(0.9, 0.999)” --adam-eps 1e-08 --clip-norm 0.1 --lr-scheduler polynomial_decay --lr 3e-05 --total-num-update 200000 --warmup-updates 500 --update-freq 4 --skip-invalid-size-inputs-valid-test --find-unused-parameters --save-interval-updates 20000 --cpu --device-id 0 --distributed-world-size 1 --distributed-no-spawn --pipeline-chunks 1 --dataset-impl mmap

try this command if u use jupyter or Kaggle notebook or colab

!pip install torch torchvision --extra-index-url https://download.pytorch.org/whl/rocm5.1.1 

!python3 launch.py --precision full --no-half

Im not using Nvidia so i cannot reinstall CUDA, any solution?

像这样运行:sh webui.sh --skip-torch-cuda-test --no-half --use-cpu all 这样可以,MacBook Pro

window
image

mac 打开终端,cd到指定目录,sh webui.sh --skip-torch-cuda-test --no-half --use-cpu all 2d2b9d935386fcd9a61c08cce8f74c1

这里是参数列表 https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Command-Line-Arguments-and-Settings

Solved after reinstalling CUDA and NCCL

Change " torch_dtype = torch.float16 " to " torch_dtype = torch.float32 "

Half becomes whole.

If you want to run it on gpu you have to set the default floating point to float32 by adding torch.set_default_dtype(torch.float64) in your code. I was running the stable diffusion model, I was facing the same error, The issue is fixed when I set the floating point to float32.

got the same error, restarting the notebook solved it for me notebook - 39_tutorial.transformers.ipynb

pipe = pipe.to(“cuda”)

RuntimeError: “LayerNormKernelImpl” not implemented for ‘Half’

I have successfully solved this problem by way of

Start the webUI with the following Mac OS Terminal command.

cd stable-diffusion-webui ./webui.sh --precision full --no-half

Special Thank >> https://stable-diffusion-art.com/install-mac/comment-page-1/

I’m using a Macbook Pro running Windows 11 and these COMMANDLINE_ARGS worked for me. –autolaunch --skip-torch-cuda-test --disable-nan-check --no-half --use-cpu all

I have a similar issue, as I run on my CPU I get the error:

2021-01-08 21:43:05 | INFO | fairseq.trainer | begin training epoch 1
Traceback (most recent call last):                                                                                                                                                          
  File "/home/wouter/Documents/same-side-classification/venv/bin/fairseq-train", line 8, in <module>
    sys.exit(cli_main())
  File "/home/wouter/Documents/same-side-classification/venv/lib/python3.8/site-packages/fairseq_cli/train.py", line 352, in cli_main
    distributed_utils.call_main(args, main)
  File "/home/wouter/Documents/same-side-classification/venv/lib/python3.8/site-packages/fairseq/distributed_utils.py", line 301, in call_main
    main(args, **kwargs)
  File "/home/wouter/Documents/same-side-classification/venv/lib/python3.8/site-packages/fairseq_cli/train.py", line 125, in main
    valid_losses, should_stop = train(args, trainer, task, epoch_itr)
  File "/usr/lib/python3.8/contextlib.py", line 75, in inner
    return func(*args, **kwds)
  File "/home/wouter/Documents/same-side-classification/venv/lib/python3.8/site-packages/fairseq_cli/train.py", line 208, in train
    log_output = trainer.train_step(samples)
  File "/usr/lib/python3.8/contextlib.py", line 75, in inner
    return func(*args, **kwds)
  File "/home/wouter/Documents/same-side-classification/venv/lib/python3.8/site-packages/fairseq/trainer.py", line 512, in train_step
    raise e
  File "/home/wouter/Documents/same-side-classification/venv/lib/python3.8/site-packages/fairseq/trainer.py", line 480, in train_step
    loss, sample_size_i, logging_output = self.task.train_step(
  File "/home/wouter/Documents/same-side-classification/venv/lib/python3.8/site-packages/fairseq/tasks/fairseq_task.py", line 416, in train_step
    loss, sample_size, logging_output = criterion(model, sample)
  File "/home/wouter/Documents/same-side-classification/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/wouter/Documents/same-side-classification/venv/lib/python3.8/site-packages/fairseq/criterions/sentence_prediction.py", line 42, in forward
    logits, _ = model(
  File "/home/wouter/Documents/same-side-classification/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/wouter/Documents/same-side-classification/venv/lib/python3.8/site-packages/fairseq/models/roberta/model.py", line 190, in forward
    x, extra = self.encoder(src_tokens, features_only, return_all_hiddens, **kwargs)
  File "/home/wouter/Documents/same-side-classification/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/wouter/Documents/same-side-classification/venv/lib/python3.8/site-packages/fairseq/models/roberta/model.py", line 458, in forward
    x, extra = self.extract_features(
  File "/home/wouter/Documents/same-side-classification/venv/lib/python3.8/site-packages/fairseq/models/roberta/model.py", line 466, in extract_features
    inner_states, _ = self.sentence_encoder(
  File "/home/wouter/Documents/same-side-classification/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/wouter/Documents/same-side-classification/venv/lib/python3.8/site-packages/fairseq/modules/transformer_sentence_encoder.py", line 258, in forward
    x = self.emb_layer_norm(x)
  File "/home/wouter/Documents/same-side-classification/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/wouter/Documents/same-side-classification/venv/lib/python3.8/site-packages/torch/nn/modules/normalization.py", line 169, in forward
    return F.layer_norm(
  File "/home/wouter/Documents/same-side-classification/venv/lib/python3.8/site-packages/torch/nn/functional.py", line 2094, in layer_norm
    return torch.layer_norm(input, normalized_shape, weight, bias, eps,
RuntimeError: "LayerNormKernelImpl" not implemented for 'Half'

If I run on my GPU, I get:

2021-01-08 21:41:13 | INFO | fairseq.trainer | begin training epoch 1
/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [0,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [1,0,0] Assertion `t >= 0 && t < n_classes` failed.
THCudaCheck FAIL file=/pytorch/aten/src/THCUNN/generic/ClassNLLCriterion.cu line=115 error=710 : device-side assert triggered
Traceback (most recent call last):                                                                                                                                                          
  File "/home/wouter/Documents/same-side-classification/venv/bin/fairseq-train", line 8, in <module>
    sys.exit(cli_main())
  File "/home/wouter/Documents/same-side-classification/venv/lib/python3.8/site-packages/fairseq_cli/train.py", line 352, in cli_main
    distributed_utils.call_main(args, main)
  File "/home/wouter/Documents/same-side-classification/venv/lib/python3.8/site-packages/fairseq/distributed_utils.py", line 301, in call_main
    main(args, **kwargs)
  File "/home/wouter/Documents/same-side-classification/venv/lib/python3.8/site-packages/fairseq_cli/train.py", line 125, in main
    valid_losses, should_stop = train(args, trainer, task, epoch_itr)
  File "/usr/lib/python3.8/contextlib.py", line 75, in inner
    return func(*args, **kwds)
  File "/home/wouter/Documents/same-side-classification/venv/lib/python3.8/site-packages/fairseq_cli/train.py", line 208, in train
    log_output = trainer.train_step(samples)
  File "/usr/lib/python3.8/contextlib.py", line 75, in inner
    return func(*args, **kwds)
  File "/home/wouter/Documents/same-side-classification/venv/lib/python3.8/site-packages/fairseq/trainer.py", line 512, in train_step
    raise e
  File "/home/wouter/Documents/same-side-classification/venv/lib/python3.8/site-packages/fairseq/trainer.py", line 480, in train_step
    loss, sample_size_i, logging_output = self.task.train_step(
  File "/home/wouter/Documents/same-side-classification/venv/lib/python3.8/site-packages/fairseq/tasks/fairseq_task.py", line 416, in train_step
    loss, sample_size, logging_output = criterion(model, sample)
  File "/home/wouter/Documents/same-side-classification/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/wouter/Documents/same-side-classification/venv/lib/python3.8/site-packages/fairseq/criterions/sentence_prediction.py", line 52, in forward
    loss = F.nll_loss(lprobs, targets, reduction="sum")
  File "/home/wouter/Documents/same-side-classification/venv/lib/python3.8/site-packages/torch/nn/functional.py", line 2264, in nll_loss
    ret = torch._C._nn.nll_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
RuntimeError: cuda runtime error (710) : device-side assert triggered at /pytorch/aten/src/THCUNN/generic/ClassNLLCriterion.cu:115

Any ideas?