OpenNMT-tf: Error while running domain adaption (fine tuning) with distributed mode

Hi,

I have created a new vocabulary files (source and target) on the domain data set and have updated the base model checkpoint file using the below statment: onmt-update-vocab --model_dir /home/ubuntu/mayub/datasets/in_use/euro/run1/en_es_transformer_b/ --output_dir /home/ubuntu/mayub/datasets/in_use/euro/run1/en_es_transformer_b/added_vocab/ --src_vocab /home/ubuntu/mayub/datasets/in_use/euro/train_vocab/src_vocab_50k.txt --tgt_vocab /home/ubuntu/mayub/datasets/in_use/euro/train_vocab/trg_vocab_50k.txt --new_src_vocab /home/ubuntu/mayub/datasets/in_use/euro/train_vocab/src_vocab_nfpa_50k.txt --new_tgt_vocab /home/ubuntu/mayub/datasets/in_use/euro/train_vocab/trg_vocab_nfpa_50k.txt

This generates the new checkpoint file which I pass to the fine tuning train_and_eval command: onmt-main train_and_eval --model_type Transformer --checkpoint_path /home/ubuntu/mayub/datasets/in_use/euro/run1/en_es_transformer_b/added_vocab/ --config /home/ubuntu/mayub/datasets/in_use/euro/run1/config_run_da_nfpa.yml --auto_config --num_gpus 8

Changes I have made to the config file -only updated the train and eval feature and labels file (source and target vocabulary are same)

data: train_features_file: /home/ubuntu/mayub/datasets/in_use/euro/run1/nfpa_train_tokenized_bpe_applied.en train_labels_file: /home/ubuntu/mayub/datasets/in_use/euro/run1/nfpa_train_tokenized_bpe_applied.es eval_features_file: /home/ubuntu/mayub/datasets/in_use/euro/run1/nfpa_dev_tokenized_bpe_applied.en eval_labels_file: /home/ubuntu/mayub/datasets/in_use/euro/run1/nfpa_dev_tokenized_bpe_applied.es source_words_vocabulary: /home/ubuntu/mayub/datasets/in_use/euro/train_vocab/src_vocab_50k.txt target_words_vocabulary: /home/ubuntu/mayub/datasets/in_use/euro/train_vocab/trg_vocab_50k.txt

Below is the error I’m getting: image

Not sure where I’m going wrong. Any help appreciated.

Thanks !

Mohammed Ayub

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 22 (8 by maintainers)

Most upvoted comments

Key optim/cond/beta1_power not found in checkpoint

You highlighted another issue here, thanks! Models trained with gradient accumulation had some different variable names than models trained without. Fixed in https://github.com/OpenNMT/OpenNMT-tf/commit/ff38e89119a43a18a10b675b8c4c80c40c2ef27a.