sockeye: Sampling chooses vocab index that does not exist with certain random seeds
Running into the following error while sampling with certain seeds:
Traceback (most recent call last):
File "/usr/lib/python3.5/runpy.py", line 184, in _run_module_as_main
"__main__", mod_spec)
File "/usr/lib/python3.5/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/net/cephfs/scratch/mathmu/map-volatility/venvs/sockeye3/lib/python3.5/site-packages/sockeye/translate.py", line 269, in <module>
main()
File "/net/cephfs/scratch/mathmu/map-volatility/venvs/sockeye3/lib/python3.5/site-packages/sockeye/translate.py", line 46, in main
run_translate(args)
File "/net/cephfs/scratch/mathmu/map-volatility/venvs/sockeye3/lib/python3.5/site-packages/sockeye/translate.py", line 155, in run_translate
input_is_json=args.json_input)
File "/net/cephfs/scratch/mathmu/map-volatility/venvs/sockeye3/lib/python3.5/site-packages/sockeye/translate.py", line 237, in read_and_translate
chunk_time = translate(output_handler, chunk, translator)
File "/net/cephfs/scratch/mathmu/map-volatility/venvs/sockeye3/lib/python3.5/site-packages/sockeye/translate.py", line 260, in translate
trans_outputs = translator.translate(trans_inputs)
File "/net/cephfs/scratch/mathmu/map-volatility/venvs/sockeye3/lib/python3.5/site-packages/sockeye/inference.py", line 861, in translate
results.append(self._make_result(trans_input, translation))
File "/net/cephfs/scratch/mathmu/map-volatility/venvs/sockeye3/lib/python3.5/site-packages/sockeye/inference.py", line 963, in _make_result
target_tokens = [self.vocab_target_inv[target_id] for target_id in target_ids]
File "/net/cephfs/scratch/mathmu/map-volatility/venvs/sockeye3/lib/python3.5/site-packages/sockeye/inference.py", line 963, in <listcomp>
target_tokens = [self.vocab_target_inv[target_id] for target_id in target_ids]
KeyError: 7525
I am calling Sockeye with a script such as
OMP_NUM_THREADS=1 python -m sockeye.translate \
-i $data_sub/$corpus.pieces.src \
-o $samples_sub_sub/$corpus.pieces.$seed.trg \
-m $model_path \
--sample \
--seed $seed \
--length-penalty-alpha 1.0 \
--device-ids 0 \
--batch-size 64 \
--disable-device-locking
Sockeye and Mxnet versions:
[2020-08-25:17:03:03:INFO:sockeye.utils:log_sockeye_version] Sockeye version 2.1.17, commit 92a020a25cbe75935c700ce2f29b286b31a87189, path /net/cephfs/scratch/mathmu/map-volatility/venvs/sockeye3/lib/python3.5/site-packages/sockeye/__init__.py
[2020-08-25:17:03:03:INFO:sockeye.utils:log_mxnet_version] MXNet version 1.6.0, path /net/cephfs/scratch/mathmu/map-volatility/venvs/sockeye3/lib/python3.5/site-packages/mxnet/__init__.py
Details that may be relevant:
- This only happens for certain random
--seeds - Running on a Tesla V100
- OS: Ubuntu 16.04.6 LTS
- the MXnet version in the CUDA 10.2 requirements file (https://github.com/awslabs/sockeye/blob/master/requirements/requirements.gpu-cu102.txt) is no longer available on Pypi. I had to install
mxnet-cu102mkl==1.6.0.post0.
The vocabulary does not have this index:
[INFO:sockeye.vocab] Vocabulary (7525 words) loaded from "/net/cephfs/scratch/mathmu/map-volatility/models/bel-eng/baseline/vocab.src.0.json"
[INFO:sockeye.vocab] Vocabulary (7525 words) loaded from "/net/cephfs/scratch/mathmu/map-volatility/models/bel-eng/baseline/vocab.trg.0.json"
I suspect that the sampling procedure somehow assumes 1-based indexing, whereas the vocabulary is 0-indexed. This would mean that there is a small chance that max_vocab_id+1 is picked as the next token.
Looking at the inference code, I am not sure yet why this happens.
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Reactions: 1
- Comments: 21 (21 by maintainers)
I still believe this is an MXnet bug, but don’t know how to reduce the problem to the single RNG state and input that cause
random.multinomialto misbehave. As @fhieber said, that would be possible if the RNG state could be saved somehow.@KellenSunderland we could need some MXnet expertise here, if you are interested in tackling this.