transformers: RuntimeError: Gather got an input of invalid size: got [2, 3, 12, 256, 64], but expected [2, 4, 12, 256, 64] (gather at /opt/conda/conda-bld/pytorch_1544199946412/work/torch/csrc/cuda/comm.cpp:227)

❓ Questions & Help

Hi, I am running a modified version of run_lm_finetuning.py, it was working fine and model checkpoints have been saved, until the last step of the first epoch (9677/9678), where I got this error:

Traceback (most recent call last):████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉| 9677/9678 [2:01:24<00:00,  1.36it/s]
  File "my_run_lm_finetuning.py", line 588, in <module>
    main()
  File "my_run_lm_finetuning.py", line 542, in main
    global_step, tr_loss = train(args, train_dataset, model, bert_model_fintuned, tokenizer, bert_tokenizer)
  File "my_run_lm_finetuning.py", line 260, in train
    outputs = model(inputs, masked_lm_labels=labels) if args.mlm else model(inputs, enc_output, labels=labels)
  File "/home/anaconda3/envs/py36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/anaconda3/envs/py36/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 144, in forward
    return self.gather(outputs, self.output_device)
  File "/home/anaconda3/envs/py36/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 156, in gather
    return gather(outputs, output_device, dim=self.dim)
  File "/home/anaconda3/envs/py36/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 67, in gather
    return gather_map(outputs)
  File "/home/anaconda3/envs/py36/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 62, in gather_map
    return type(out)(map(gather_map, zip(*outputs)))
  File "/home/anaconda3/envs/py36/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 62, in gather_map
    return type(out)(map(gather_map, zip(*outputs)))
  File "/home/anaconda3/envs/py36/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 54, in gather_map
    return Gather.apply(target_device, dim, *outputs)
  File "/home/anaconda3/envs/py36/lib/python3.6/site-packages/torch/nn/parallel/_functions.py", line 68, in forward
    return comm.gather(inputs, ctx.dim, ctx.target_device)
  File "/home/anaconda3/envs/py36/lib/python3.6/site-packages/torch/cuda/comm.py", line 166, in gather
    return torch._C._gather(tensors, dim, destination)
RuntimeError: Gather got an input of invalid size: got [2, 3, 12, 256, 64], but expected [2, 4, 12, 256, 64] (gather at /opt/conda/conda-bld/pytorch_1544199946412/work/torch/csrc/cuda/comm.cpp:227)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x45 (0x7f3c52b7fcc5 in /home/anaconda3/envs/py36/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #1: torch::cuda::gather(c10::ArrayRef<at::Tensor>, long, c10::optional<int>) + 0x4d8 (0x7f3c936eaba8 in /home/anaconda3/envs/py36/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #2: <unknown function> + 0x4f99de (0x7f3c936ed9de in /home/anaconda3/envs/py36/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #3: <unknown function> + 0x111e36 (0x7f3c93305e36 in /home/anaconda3/envs/py36/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
<omitting python frames>
frame #14: THPFunction_apply(_object*, _object*) + 0x5dd (0x7f3c9350140d in /home/anaconda3/envs/py36/lib/python3.6/site-packages/torch/lib/libtorch_python.so)

Note that in this experiment I used a fine-tuned version of Bert (I fine-tuned it using your previous script in lm_finetune folder) and there I have the max_seq_length =256, however when running this (run_lm_finetuning.py) , I have block_size=128.

Any idea of what is the error for?

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 17 (1 by maintainers)

Most upvoted comments

@ehsan-soe I fixed the problem by truncating incomplete batches. So if there are 2001 examples and my batch size = 2, then I truncate the last example and train on the first 2000. This has fixed it for me both with and without distributed. My load_and_cache function now looks like this

def load_and_cache_examples(args, tokenizer, evaluate=False, fpath=None):
    if fpath:
        dataset = TextDataset(tokenizer, args, fpath)
    else:
        dataset = TextDataset(tokenizer, args, args.eval_data_path if evaluate else args.train_data_path)

    # Ignore incomplete batches
    # If you don't do this, you'll get an error at the end of training
    n = len(dataset) % args.per_gpu_train_batch_size
    if n != 0:
        dataset.examples = dataset.examples[:-n]
    return dataset