OpenNMT-py: Blanks in output file + out of memory issues during translation

Issue description

Blows up with error after translate.py ran for a while. When I inspected the output file, I noticed that only some of the lines (maybe ~ 20% of them) had been randomly translated, so many lines were blank. I attempted to re-run translate.py on a smaller input file (about 8000 lines, rather than over 1,000,000 lines), and the output file was completely blank, but I got the same error.

Interestingly, I did not get this error when I built a model with the same version of PyOpenNMT on a different dataset, but it’s possible that something in my conda environment has changed since then.

Code example

(pyTorchLanguage) C:\src\pyopennmt\OpenNMT-py>python preprocess.py -train_src C:\src\torchevere-offensive-classifier\training\character\train_src.txt -train_tgt C:\src\torchevere-offensive-classifier\training\character\train_dst.txt -valid_src C:\src\torchevere-offensive-classifier\training\character\val_src.txt -valid_tgt C:\src\torchevere-offensive-classifier\training\character\val_dst.txt -save_data data/character/tc-offense-classifier-character_v1 -src_seq_length 5000 -tgt_seq_length 5000

(pyTorchLanguage) C:\src\pyopennmt\OpenNMT-py>python train.py -data data/character/tc-offense-classifier-character_v1 -save_model tc-offense-classifier-character_v2 -gpuid 0 -layers 3 -learning_rate_decay 0.99 -epochs 100 -rnn_size 500

(pyTorchLanguage) C:\src\pyopennmt\OpenNMT-py>python translate.py -model C:\src\pyopennmt\OpenNMT-py\tc-offense-classifier-character_v2_acc_98.54_ppl_1.09_e93.pt -src C:\src\torchevere-offensive-classifier\test.txt -output C:\src\torchevere-offensive-classifier\tc-character_v2_e93.txt

Loading model parameters. average src size 4.5279249898083975 7359 Traceback (most recent call last): File “translate.py”, line 29, in <module> main(opt) File “translate.py”, line 18, in main opt.batch_size, opt.attn_debug) File “C:\src\pyopennmt\OpenNMT-py\onmt\translate\Translator.py”, line 158, in translate for batch in data_iter: File “C:\Anaconda3\envs\pyTorchLanguage\lib\site-packages\torchtext\data\iterator.py”, line 151, in iter self.train) File “C:\Anaconda3\envs\pyTorchLanguage\lib\site-packages\torchtext\data\batch.py”, line 27, in init setattr(self, name, field.process(batch, device=device, train=train)) File “C:\Anaconda3\envs\pyTorchLanguage\lib\site-packages\torchtext\data\field.py”, line 188, in process tensor = self.numericalize(padded, device=device, train=train) File “C:\Anaconda3\envs\pyTorchLanguage\lib\site-packages\torchtext\data\field.py”, line 308, in numericalize arr = self.postprocessing(arr, None, train) File “C:\src\pyopennmt\OpenNMT-py\onmt\io\TextDataset.py”, line 221, in make_src src_size = max([t.size(0) for t in data]) File “C:\src\pyopennmt\OpenNMT-py\onmt\io\TextDataset.py”, line 221, in <listcomp> src_size = max([t.size(0) for t in data]) RuntimeError: invalid argument 2: dimension 0 out of range of 0D tensor at c:\anaconda2\conda-bld\pytorch_1519501749874\work\torch\lib\th\generic/THTensor.c:24

System Info

(pyTorchLanguage) C:\src\pyopennmt>python collect_env.py

Collecting environment information… PyTorch version: 0.3.1.post2 Is debug build: No CUDA used to build PyTorch: 9.0

OS: Microsoft Windows 10 Pro GCC version: (x86_64-posix-seh-rev0, Built by MinGW-W64 project) 5.4.0 CMake version: version 3.10.3

Python version: 3.6 Is CUDA available: Yes CUDA runtime version: 7.5.17 GPU models and configuration: Could not collect Nvidia driver version: Could not collect cuDNN version: Could not collect

Versions of relevant libraries: [pip] Could not collect [conda] Could not collect

CuDNN is v7.0. Nvidia driver is v. 397.64. GPU model is: Titan Xp pip 10.0.1 from c:\anaconda3\envs\pyTorchLanguage\lib\site-packages\pip (python 3.6) conda 4.5.4

I believe (if I remember correctly) that PyTorch was installed via pip (I don’t remember exactly how) after pulling the latest source of PyOpenNMT from Github.

Any ideas?

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 19 (5 by maintainers)

Most upvoted comments

Just pushed a fix, git pull and try again.