rgn: Restoring from checkpoint failed.
I run tensorflow 1.11 on 64bit CentOS Linux 6.10. I downloaded pre-trained model RGN7.tar.gz, untar it to RGN7/, and run protling.py as
python2.7 ../rgn/model/protling.py ../rgn/configurations/CASP7.config -d RGN7 -p
The prediction apparently failed with the following complaint. Is this caused by mismatching tensorflow version?
WARNING:tensorflow:From ~/end2end/miniconda2.7/lib/python2.7/site-packages/tensorflow/python/training/input.py:187: __init__ (from tensorflow.python.training.queue_runner_impl) is deprecated and will be removed in a future version.
Instructions for updating:
To construct input pipelines, use the `tf.data` module.
WARNING:tensorflow:From ~/end2end/miniconda2.7/lib/python2.7/site-packages/tensorflow/python/training/input.py:187: add_queue_runner (from tensorflow.python.training.queue_runner_impl) is deprecated and will be removed in a future version.
Instructions for updating:
To construct input pipelines, use the `tf.data` module.
WARNING:tensorflow:From ~/end2end/rgn/model/geom_ops.py:98: calling reduce_sum (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version.
Instructions for updating:
keep_dims is deprecated, use keepdims instead
*** training configuration ***
{'architecture': {'all_to_all_peepholes': False,
'all_to_recurrent_skip_connections': False,
'alphabet_size': 60,
'alphabet_trainable': True,
'bidirectional': True,
'first_residual_connection_from_nth_layer': 1,
'higher_order_layers': True,
'include_dihedrals_between_layers': False,
'include_evolutionary': True,
'include_primary': True,
'include_recurrent_outputs_between_layers': True,
'input_to_recurrent_skip_connections': False,
'recurrent_layer_size': [800, 800],
'recurrent_nonlinear_out_proj_function': 'tanh',
'recurrent_nonlinear_out_proj_size': None,
'recurrent_peepholes': True,
'recurrent_to_output_skip_connections': False,
'recurrent_unit': 'CudnnLSTM',
'residual_connections_every_n_layers': None,
'tertiary_output': 'linear_alphabet'},
'computing': {'allow_gpu_growth': False,
'default_device': '',
'fill_gpu': False,
'functions_on_devices': {'/cpu:0': ['point_to_coordinate']},
'gpu_fraction': 1.0,
'num_cpus': 4,
'num_reconstruction_fragments': 6,
'num_reconstruction_parallel_iters': 4,
'num_recurrent_parallel_iters': 1,
'num_recurrent_shards': 1},
'curriculum': {'base': 100.0,
'behavior': None,
'change_num_iterations': 5,
'loss_history_subgroup': 'all',
'mode': None,
'rate': 0.002,
'sharpness': 20.0,
'slope': 1.0,
'threshold': 5.0,
'update_loss_history': False},
'initialization': {'alphabet_init': {'dist': 'uniform', 'range': 3.14159},
'alphabet_seed': None,
'angle_shift': [0.0, 0.0, 0.0],
'dropout_seed': None,
'evolutionary_multiplier': 1.0,
'graph_seed': 254,
'queue_seed': None,
'recurrent_forget_bias': 1.0,
'recurrent_init': {'base': {'dist': 'uniform',
'range': 0.01},
'bias': {'dist': 'uniform',
'range': 0}},
'recurrent_nonlinear_out_proj_init': {'base': {},
'bias': {}},
'recurrent_nonlinear_out_proj_seed': None,
'recurrent_out_proj_init': {'base': {'dist': 'uniform',
'range': 0.01},
'bias': {'dist': 'uniform',
'range': 0}},
'recurrent_out_proj_seed': None,
'recurrent_seed': None,
'zoneout_seed': None},
'io': {'alphabet_file': None,
'checkpoint_every_n_hours': 24,
'checkpoints_directory': 'RGN7/runs/CASP7/ProteinNet7Thinning90/checkpoints/',
'data_files': None,
'data_files_glob': 'RGN7/data/ProteinNet7Thinning90/training/[!a-z]*',
'detailed_logs': False,
'evaluation_sub_groups': ['10', '20', '30', '40', '50', '70', '90'],
'log_alphabet': False,
'log_model_summaries': True,
'logs_directory': 'RGN7/runs/CASP7/ProteinNet7Thinning90/logs/',
'max_checkpoints': None,
'name': 'training',
'num_edge_residues': 0,
'num_evo_entries': 42},
'loss': {'atoms': 'c_alpha',
'batch_dependent_normalization': True,
'include': True,
'tertiary_normalization': 'first',
'tertiary_weight': 1.0},
'optimization': {'alphabet_temperature': 1.0,
'batch_size': 32,
'beta1': 0.95,
'beta2': 0.99,
'decay': 0.9,
'epsilon': 1e-07,
'gradient_threshold': 5.0,
'initial_accumulator_value': 0.1,
'learning_rate': 0.0001,
'momentum': 0.0,
'num_epochs': 100000,
'num_steps': 700,
'optimizer': 'adam',
'recurrent_threshold': None,
'rescale_behavior': 'norm_rescaling'},
'queueing': {'batch_queue_capacity': 10000,
'bucket_boundaries': None,
'file_queue_capacity': 1000,
'min_after_dequeue': 500,
'num_evaluation_invocations': 1,
'shuffle': True},
'regularization': {'alphabet_keep_probability': 1.0,
'alphabet_normalization': None,
'recurrent_input_keep_probability': [0.5, 0.5],
'recurrent_keep_probability': 1.0,
'recurrent_layer_normalization': False,
'recurrent_memory_zonein_probability': 1.0,
'recurrent_nonlinear_out_proj_normalization': None,
'recurrent_output_keep_probability': 1.0,
'recurrent_state_zonein_probability': 1.0,
'recurrent_variational_dropout': False}}
*** weighted validation evaluation configuration ***
{'architecture': {'all_to_all_peepholes': False,
'all_to_recurrent_skip_connections': False,
'alphabet_size': 60,
'alphabet_trainable': True,
'bidirectional': True,
'first_residual_connection_from_nth_layer': 1,
'higher_order_layers': True,
'include_dihedrals_between_layers': False,
'include_evolutionary': True,
'include_primary': True,
'include_recurrent_outputs_between_layers': True,
'input_to_recurrent_skip_connections': False,
'recurrent_layer_size': [800, 800],
'recurrent_nonlinear_out_proj_function': 'tanh',
'recurrent_nonlinear_out_proj_size': None,
'recurrent_peepholes': True,
'recurrent_to_output_skip_connections': False,
'recurrent_unit': 'CudnnLSTM',
'residual_connections_every_n_layers': None,
'tertiary_output': 'linear_alphabet'},
'computing': {'allow_gpu_growth': False,
'default_device': '',
'fill_gpu': False,
'functions_on_devices': {'/cpu:0': ['point_to_coordinate']},
'gpu_fraction': 1.0,
'num_cpus': 4,
'num_reconstruction_fragments': 6,
'num_reconstruction_parallel_iters': 4,
'num_recurrent_parallel_iters': 1,
'num_recurrent_shards': 1},
'curriculum': {'base': 100.0,
'behavior': None,
'change_num_iterations': 5,
'loss_history_subgroup': 'all',
'mode': None,
'rate': 0.002,
'sharpness': 20.0,
'slope': 1.0,
'threshold': 5.0,
'update_loss_history': True},
'initialization': {'alphabet_init': {'dist': 'uniform', 'range': 3.14159},
'alphabet_seed': None,
'angle_shift': [0.0, 0.0, 0.0],
'dropout_seed': None,
'evolutionary_multiplier': 1.0,
'graph_seed': 254,
'queue_seed': None,
'recurrent_forget_bias': 1.0,
'recurrent_init': {'base': {'dist': 'uniform',
'range': 0.01},
'bias': {'dist': 'uniform',
'range': 0}},
'recurrent_nonlinear_out_proj_init': {'base': {},
'bias': {}},
'recurrent_nonlinear_out_proj_seed': None,
'recurrent_out_proj_init': {'base': {'dist': 'uniform',
'range': 0.01},
'bias': {'dist': 'uniform',
'range': 0}},
'recurrent_out_proj_seed': None,
'recurrent_seed': None,
'zoneout_seed': None},
'io': {'alphabet_file': None,
'checkpoint_every_n_hours': 24,
'checkpoints_directory': None,
'data_files': None,
'data_files_glob': 'RGN7/data/ProteinNet7Thinning90/validation/1',
'detailed_logs': False,
'evaluation_sub_groups': ['10', '20', '30', '40', '50', '70', '90'],
'log_alphabet': False,
'log_model_summaries': True,
'logs_directory': None,
'max_checkpoints': None,
'name': 'evaluation_wt_validation',
'num_edge_residues': 0,
'num_evo_entries': 42},
'loss': {'atoms': 'c_alpha',
'batch_dependent_normalization': True,
'include': False,
'tertiary_normalization': 'first',
'tertiary_weight': 1.0},
'optimization': {'alphabet_temperature': 1.0,
'batch_size': 1,
'beta1': 0.95,
'beta2': 0.99,
'decay': 0.9,
'epsilon': 1e-07,
'gradient_threshold': 5.0,
'initial_accumulator_value': 0.1,
'learning_rate': 0.0001,
'momentum': 0.0,
'num_epochs': 1,
'num_steps': 700,
'optimizer': 'adam',
'recurrent_threshold': None,
'rescale_behavior': 'norm_rescaling'},
'queueing': {'batch_queue_capacity': 300,
'bucket_boundaries': None,
'file_queue_capacity': 10,
'min_after_dequeue': 10,
'num_evaluation_invocations': 1,
'shuffle': False},
'regularization': {'alphabet_keep_probability': 1.0,
'alphabet_normalization': None,
'recurrent_input_keep_probability': [0.5, 0.5],
'recurrent_keep_probability': 1.0,
'recurrent_layer_normalization': False,
'recurrent_memory_zonein_probability': 1.0,
'recurrent_nonlinear_out_proj_normalization': None,
'recurrent_output_keep_probability': 1.0,
'recurrent_state_zonein_probability': 1.0,
'recurrent_variational_dropout': False}}
2018-10-28 20:59:04.218884: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX
Traceback (most recent call last):
File "../rgn/model/protling.py", line 532, in <module>
while loop(args): pass
File "../rgn/model/protling.py", line 384, in loop
session = models['training'].start(models.values())
File "~/end2end/rgn/model/model.py", line 448, in _start
self._saver.restore(session, latest_checkpoint)
File "~/end2end/miniconda2.7/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1574, in restore
err, "a mismatch between the current graph and the graph")
tensorflow.python.framework.errors_impl.InvalidArgumentError: Restoring from checkpoint failed. This is most likely due to a mismatch between the current graph and the graph from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:
No OpKernel was registered to support Op 'CudnnRNNCanonicalToParams' with these attrs. Registered devices: [CPU], Registered kernels:
<no registered kernels>
[[{{node RGN/training/layer0/fw/cudnn_lstm/cudnn_lstm/CudnnRNNCanonicalToParams}} = CudnnRNNCanonicalToParams[T=DT_FLOAT, direction="unidirectional", dropout=0, input_mode="linear_input", num_params=8, rnn_mode="lstm", seed=254, seed2=4497](RGN/training/layer0/fw/cudnn_lstm/cudnn_lstm/CudnnRNNCanonicalToParams/num_layers, RGN/training/layer0/fw/cudnn_lstm/cudnn_lstm/CudnnRNNCanonicalToParams/num_units, RGN/training/layer0/fw/cudnn_lstm/cudnn_lstm/CudnnRNNCanonicalToParams/input_size, RGN/training/layer0/fw/cudnn_lstm/cudnn_lstm/random_uniform, RGN/training/layer0/fw/cudnn_lstm/cudnn_lstm/random_uniform_1, RGN/training/layer0/fw/cudnn_lstm/cudnn_lstm/random_uniform_2, RGN/training/layer0/fw/cudnn_lstm/cudnn_lstm/random_uniform_3, RGN/training/layer0/fw/cudnn_lstm/cudnn_lstm/random_uniform_4, RGN/training/layer0/fw/cudnn_lstm/cudnn_lstm/random_uniform_5, RGN/training/layer0/fw/cudnn_lstm/cudnn_lstm/random_uniform_6, RGN/training/layer0/fw/cudnn_lstm/cudnn_lstm/random_uniform_7, RGN/training/layer0/fw/cudnn_lstm/cudnn_lstm/random_uniform_8, RGN/training/layer0/fw/cudnn_lstm/cudnn_lstm/random_uniform_9, RGN/training/layer0/fw/cudnn_lstm/cudnn_lstm/random_uniform_10, RGN/training/layer0/fw/cudnn_lstm/cudnn_lstm/random_uniform_11, RGN/training/layer0/fw/cudnn_lstm/cudnn_lstm/random_uniform_12, RGN/training/layer0/fw/cudnn_lstm/cudnn_lstm/random_uniform_13, RGN/training/layer0/fw/cudnn_lstm/cudnn_lstm/random_uniform_14, RGN/training/layer0/fw/cudnn_lstm/cudnn_lstm/random_uniform_15)]]
Caused by op u'RGN/training/layer0/fw/cudnn_lstm/cudnn_lstm/CudnnRNNCanonicalToParams', defined at:
File "../rgn/model/protling.py", line 532, in <module>
while loop(args): pass
File "../rgn/model/protling.py", line 306, in loop
models.update({'training': RGNModel('training', configs['training'])})
File "~/end2end/rgn/model/model.py", line 114, in __init__
self._create_graph(mode, self.config)
File "~/end2end/rgn/model/model.py", line 200, in _create_graph
recurrent_outputs, recurrent_states = _higher_recurrence(mode, recurrence_config, inputs, num_stepss, alphabet=alphabet)
File "~/end2end/rgn/model/model.py", line 693, in _higher_recurrence
layer_recurrent_outputs, layer_recurrent_states = _recurrence(mode, layer_config, layer_inputs, num_stepss)
File "~/end2end/rgn/model/model.py", line 787, in _recurrence
outputs_directed, (_, states_directed) = rnn(inputs_directed, training=is_training)
File "~/end2end/miniconda2.7/lib/python2.7/site-packages/tensorflow/python/layers/base.py", line 364, in __call__
outputs = super(Layer, self).__call__(inputs, *args, **kwargs)
File "~/end2end/miniconda2.7/lib/python2.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 759, in __call__
self.build(input_shapes)
File "~/end2end/miniconda2.7/lib/python2.7/site-packages/tensorflow/contrib/cudnn_rnn/python/layers/cudnn_rnn.py", line 352, in build
opaque_params_t = self._canonical_to_opaque(weights, biases)
File "~/end2end/miniconda2.7/lib/python2.7/site-packages/tensorflow/contrib/cudnn_rnn/python/layers/cudnn_rnn.py", line 474, in _canonical_to_opaque
direction=self._direction)
File "~/end2end/miniconda2.7/lib/python2.7/site-packages/tensorflow/contrib/cudnn_rnn/python/ops/cudnn_rnn_ops.py", line 1251, in cudnn_rnn_canonical_to_opaque_params
name=name)
File "~/end2end/miniconda2.7/lib/python2.7/site-packages/tensorflow/python/ops/gen_cudnn_rnn_ops.py", line 642, in cudnn_rnn_canonical_to_params
name=name)
File "~/end2end/miniconda2.7/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "~/end2end/miniconda2.7/lib/python2.7/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
return func(*args, **kwargs)
File "~/end2end/miniconda2.7/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 3272, in create_op
op_def=op_def)
File "~/end2end/miniconda2.7/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1768, in __init__
self._traceback = tf_stack.extract_stack()
InvalidArgumentError (see above for traceback): Restoring from checkpoint failed. This is most likely due to a mismatch between the current graph and the graph from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:
No OpKernel was registered to support Op 'CudnnRNNCanonicalToParams' with these attrs. Registered devices: [CPU], Registered kernels:
<no registered kernels>
[[{{node RGN/training/layer0/fw/cudnn_lstm/cudnn_lstm/CudnnRNNCanonicalToParams}} = CudnnRNNCanonicalToParams[T=DT_FLOAT, direction="unidirectional", dropout=0, input_mode="linear_input", num_params=8, rnn_mode="lstm", seed=254, seed2=4497](RGN/training/layer0/fw/cudnn_lstm/cudnn_lstm/CudnnRNNCanonicalToParams/num_layers, RGN/training/layer0/fw/cudnn_lstm/cudnn_lstm/CudnnRNNCanonicalToParams/num_units, RGN/training/layer0/fw/cudnn_lstm/cudnn_lstm/CudnnRNNCanonicalToParams/input_size, RGN/training/layer0/fw/cudnn_lstm/cudnn_lstm/random_uniform, RGN/training/layer0/fw/cudnn_lstm/cudnn_lstm/random_uniform_1, RGN/training/layer0/fw/cudnn_lstm/cudnn_lstm/random_uniform_2, RGN/training/layer0/fw/cudnn_lstm/cudnn_lstm/random_uniform_3, RGN/training/layer0/fw/cudnn_lstm/cudnn_lstm/random_uniform_4, RGN/training/layer0/fw/cudnn_lstm/cudnn_lstm/random_uniform_5, RGN/training/layer0/fw/cudnn_lstm/cudnn_lstm/random_uniform_6, RGN/training/layer0/fw/cudnn_lstm/cudnn_lstm/random_uniform_7, RGN/training/layer0/fw/cudnn_lstm/cudnn_lstm/random_uniform_8, RGN/training/layer0/fw/cudnn_lstm/cudnn_lstm/random_uniform_9, RGN/training/layer0/fw/cudnn_lstm/cudnn_lstm/random_uniform_10, RGN/training/layer0/fw/cudnn_lstm/cudnn_lstm/random_uniform_11, RGN/training/layer0/fw/cudnn_lstm/cudnn_lstm/random_uniform_12, RGN/training/layer0/fw/cudnn_lstm/cudnn_lstm/random_uniform_13, RGN/training/layer0/fw/cudnn_lstm/cudnn_lstm/random_uniform_14, RGN/training/layer0/fw/cudnn_lstm/cudnn_lstm/random_uniform_15)]]
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 18 (9 by maintainers)
That’s unlikely to have worked. What do the logs say? Output should be in base/runs/runName/datasetName/…
I had the same issue and solved it by explicitly specifying the -g argument as 0. However, after the code runs to completion, where are the output files generated about the prediction?