modernmt: Training interrupts with error

Version 4.2.

`It happened again.

@nicolabertoldi

` ./mmt create zh nl data/procdut -e procdut --debug --tensorboard-port 8066 /home/loek/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:526: FutureWarning: Passing (type, 1) or ‘1type’ as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / ‘(1,)type’. _np_qint8 = np.dtype([(“qint8”, np.int8, 1)]) /home/loek/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:527: FutureWarning: Passing (type, 1) or ‘1type’ as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / ‘(1,)type’. _np_quint8 = np.dtype([(“quint8”, np.uint8, 1)]) /home/loek/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:528: FutureWarning: Passing (type, 1) or ‘1type’ as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / ‘(1,)type’. _np_qint16 = np.dtype([(“qint16”, np.int16, 1)]) /home/loek/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:529: FutureWarning: Passing (type, 1) or ‘1type’ as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / ‘(1,)type’. _np_quint16 = np.dtype([(“quint16”, np.uint16, 1)]) /home/loek/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:530: FutureWarning: Passing (type, 1) or ‘1type’ as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / ‘(1,)type’. _np_qint32 = np.dtype([(“qint32”, np.int32, 1)]) /home/loek/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:535: FutureWarning: Passing (type, 1) or ‘1type’ as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / ‘(1,)type’. np_resource = np.dtype([(“resource”, np.ubyte, 1)])

=========== TRAINING STARTED ===========

ENGINE: procdut LANGUAGE: zh > nl

(1/4) Cleaning corpora (1/6) Cleaning corpora… DONE in 0s (2/6) De-duplicate corpora… DONE in 0s (3/6) Preprocess corpora… DONE in 0s (4/6) Creating aligner filter… DONE in 0s (5/6) Scoring corpora… DONE in 0s (6/6) Apply aligner-based filter… DONE in 0s (1/4) Cleaning corpora DONE in 2s (2/4) Creating engine (2/4) Creating engine DONE in 0s (3/4) Generating binary archives (1/4) Tokenize corpora… DONE in 0s (2/4) Training BPE model… DONE in 3s (3/4) Encoding training corpora… DONE in 0s (4/4) Generating binary data… DONE in 0s (3/4) Generating binary archives DONE in 5s (4/4) Training neural model (1/2) Train neural network… ERROR Unexpected exception: Command ‘fairseq-train /home/loek/modernmt2/runtime/procdut/tmp/training/data_generated --save-dir /home/loek/modernmt2/runtime/procdut/tmp/training/_temp_train/nn_model --task mmt_translation --user-dir /home/loek/modernmt2/build/mmt-4.2.jar/mmt --share-all-embeddings --no-progress-bar --tensorboard-logdir /home/loek/modernmt2/runtime/procdut/tmp/training/_temp_train/tensorboard_logdir --dataset-impl mmap --arch transformer_mmt_base --clip-norm 0.0 --label-smoothing 0.1 --attention-dropout 0.1 --dropout 0.3 --weight-decay 0.0 --criterion label_smoothed_cross_entropy --optimizer adam --adam-betas (0.9, 0.98) --log-interval 100 --lr 0.0005 --lr-scheduler inverse_sqrt --min-lr 1e-09 --warmup-init-lr 1e-07 --warmup-updates 4000 --max-tokens 3072 --update-freq 4 --save-interval-updates 1000 --keep-interval-updates 10 --keep-last-epochs 10’ failed with exit code 1`

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 40 (12 by maintainers)

Most upvoted comments

@LoekvanKooten in newer version no - the model will stop training itself, so perhaps this issue should be closed now?