DeepSpeech: Restoring from checkpoint failed.

virtualenv --python=python3.6 env

source env/bin/activate

git clone https://github.com/mozilla/DeepSpeech
git checkout v0.6.0

downloaded v0.6.0 pretrained checkpoint
https://github.com/mozilla/DeepSpeech/releases/download/v0.6.0/deepspeech-0.6.0-checkpoint.tar.gz

cd DeepSpeech
pip install -r requirements.txt

pip install tensorflow-gpu == 1.14.0

pip3 install $(python3 util/taskcluster.py --decoder)

Continuing training from a release model:
mkdir fine_tuning_checkpoints
python3 DeepSpeech.py --n_hidden 2048 --checkpoint_dir ./deepspeech-0.6.0-checkpoint --epochs 3 --train_files ./data/csv_files/train.csv --dev_files ./data/csv_files/dev.csv --test_files ./data/csv_files/test.csv --learning_rate 0.0001

Instructions for updating:
Use standard file APIs to check for files with this prefix.
W1206 06:45:41.998423 140389067556672 deprecation.py:323] From /media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/training/saver.py:1276: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
INFO:tensorflow:Restoring parameters from ./deepspeech-0.6.0-checkpoint/best_dev-233784
I1206 06:45:42.020016 140389067556672 saver.py:1280] Restoring parameters from ./deepspeech-0.6.0-checkpoint/best_dev-233784
Traceback (most recent call last):
  File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1356, in _do_call
    return fn(*args)
  File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1341, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1429, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.NotFoundError: Key cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias/Adam not found in checkpoint
	 [[{{node save_1/RestoreV2}}]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1286, in restore
    {self.saver_def.filename_tensor_name: save_path})
  File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 950, in run
    run_metadata_ptr)
  File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1173, in _run
    feed_dict_tensor, options, run_metadata)
  File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1350, in _do_run
    run_metadata)
  File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1370, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.NotFoundError: Key cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias/Adam not found in checkpoint
	 [[node save_1/RestoreV2 (defined at DeepSpeech.py:495) ]]

Original stack trace for 'save_1/RestoreV2':
  File "DeepSpeech.py", line 965, in <module>
    absl.app.run(main)
  File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "DeepSpeech.py", line 938, in main
    train()
  File "DeepSpeech.py", line 495, in train
    best_dev_saver = tfv1.train.Saver(max_to_keep=1)
  File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 825, in __init__
    self.build()
  File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 837, in build
    self._build(self._filename, build_save=True, build_restore=True)
  File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 875, in _build
    build_restore=build_restore)
  File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 508, in _build_internal
    restore_sequentially, reshape)
  File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 328, in _AddRestoreOps
    restore_sequentially)
  File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 575, in bulk_restore
    return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
  File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/ops/gen_io_ops.py", line 1696, in restore_v2
    name=name)
  File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
    op_def=op_def)
  File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3616, in create_op
    op_def=op_def)
  File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2005, in __init__
    self._traceback = tf_stack.extract_stack()


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1296, in restore
    names_to_keys = object_graph_key_mapping(save_path)
  File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1614, in object_graph_key_mapping
    object_graph_string = reader.get_tensor(trackable.OBJECT_GRAPH_PROTO_KEY)
  File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 678, in get_tensor
    return CheckpointReader_GetTensor(self, compat.as_bytes(tensor_str))
tensorflow.python.framework.errors_impl.NotFoundError: Key _CHECKPOINTABLE_OBJECT_GRAPH not found in checkpoint

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "DeepSpeech.py", line 965, in <module>
    absl.app.run(main)
  File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "DeepSpeech.py", line 938, in main
    train()
  File "DeepSpeech.py", line 554, in train
    loaded = try_loading(session, best_dev_saver, 'best_dev_checkpoint', 'best validation')
  File "DeepSpeech.py", line 403, in try_loading
    saver.restore(session, checkpoint_path)
  File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1302, in restore
    err, "a Variable name or other graph key that is missing")
tensorflow.python.framework.errors_impl.NotFoundError: Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Key cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias/Adam not found in checkpoint
	 [[node save_1/RestoreV2 (defined at DeepSpeech.py:495) ]]

Original stack trace for 'save_1/RestoreV2':
  File "DeepSpeech.py", line 965, in <module>
    absl.app.run(main)
  File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "DeepSpeech.py", line 938, in main
    train()
  File "DeepSpeech.py", line 495, in train
    best_dev_saver = tfv1.train.Saver(max_to_keep=1)
  File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 825, in __init__
    self.build()
  File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 837, in build
    self._build(self._filename, build_save=True, build_restore=True)
  File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 875, in _build
    build_restore=build_restore)
  File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 508, in _build_internal
    restore_sequentially, reshape)
  File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 328, in _AddRestoreOps
    restore_sequentially)
  File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 575, in bulk_restore
    return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
  File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/ops/gen_io_ops.py", line 1696, in restore_v2
    name=name)
  File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
    op_def=op_def)
  File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3616, in create_op
    op_def=op_def)
  File "/media/user1/storage-1/Murugan/DeepSpeech/env/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2005, in __init__
    self._traceback = tf_stack.extract_stack()
  • OS Platform and Distribution: Linux Ubuntu 16.04
  • TensorFlow version: 1.14.0
  • Python version: 3.6.5
  • CUDA/cuDNN version: 10.0
  • GPU model and memory: 24 GB x 4 GPUs

How to resolve this issue? i was followed right instructions but why it is happened?

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 28 (1 by maintainers)

Most upvoted comments

You have to specify --use_cudnn_rnn, it’s not enabled by default.

@lissyx sir. my CuDNN setup might be wrong?

@lissyx sir. how to resolve this issue? what is the problem here i did? 😃

tensorflow.python.framework.errors_impl.NotFoundError: Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Key cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias/Adam not found in checkpoint
	 [[node save_1/RestoreV2 (defined at DeepSpeech.py:495) ]]

Ok, I can’t keep repeating over and over the same things. I told you: the error is because it cannot resume using CuDNN. Check your setup if it is supposed to work.

here i am case 2. my system is capable of using CuDNN RNN. then normally with --checkpoint_dir is enough for me.

but why i need -cudnn_checkpoint?

This is what I asked you in the beginning, if your setup was properly done for CuDNN. The error obviously suggests it’s not the case.

Weird. I remember this error when loading a cudnn checkpoint on a non cudnn setup, can you check that? I’ the release notes we also document the flag to use in that case, can you test with it?

@MuruganR96 Can you share pip list output ?