rgn: Restoring from checkpoint failed.

I run tensorflow 1.11 on 64bit CentOS Linux 6.10. I downloaded pre-trained model RGN7.tar.gz, untar it to RGN7/, and run protling.py as

python2.7 ../rgn/model/protling.py ../rgn/configurations/CASP7.config -d RGN7 -p

The prediction apparently failed with the following complaint. Is this caused by mismatching tensorflow version?

WARNING:tensorflow:From ~/end2end/miniconda2.7/lib/python2.7/site-packages/tensorflow/python/training/input.py:187: __init__ (from tensorflow.python.training.queue_runner_impl) is deprecated and will be removed in a future version.
Instructions for updating:
To construct input pipelines, use the `tf.data` module.
WARNING:tensorflow:From ~/end2end/miniconda2.7/lib/python2.7/site-packages/tensorflow/python/training/input.py:187: add_queue_runner (from tensorflow.python.training.queue_runner_impl) is deprecated and will be removed in a future version.
Instructions for updating:
To construct input pipelines, use the `tf.data` module.
WARNING:tensorflow:From ~/end2end/rgn/model/geom_ops.py:98: calling reduce_sum (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version.
Instructions for updating:
keep_dims is deprecated, use keepdims instead
*** training configuration ***
{'architecture': {'all_to_all_peepholes': False,
                  'all_to_recurrent_skip_connections': False,
                  'alphabet_size': 60,
                  'alphabet_trainable': True,
                  'bidirectional': True,
                  'first_residual_connection_from_nth_layer': 1,
                  'higher_order_layers': True,
                  'include_dihedrals_between_layers': False,
                  'include_evolutionary': True,
                  'include_primary': True,
                  'include_recurrent_outputs_between_layers': True,
                  'input_to_recurrent_skip_connections': False,
                  'recurrent_layer_size': [800, 800],
                  'recurrent_nonlinear_out_proj_function': 'tanh',
                  'recurrent_nonlinear_out_proj_size': None,
                  'recurrent_peepholes': True,
                  'recurrent_to_output_skip_connections': False,
                  'recurrent_unit': 'CudnnLSTM',
                  'residual_connections_every_n_layers': None,
                  'tertiary_output': 'linear_alphabet'},
 'computing': {'allow_gpu_growth': False,
               'default_device': '',
               'fill_gpu': False,
               'functions_on_devices': {'/cpu:0': ['point_to_coordinate']},
               'gpu_fraction': 1.0,
               'num_cpus': 4,
               'num_reconstruction_fragments': 6,
               'num_reconstruction_parallel_iters': 4,
               'num_recurrent_parallel_iters': 1,
               'num_recurrent_shards': 1},
 'curriculum': {'base': 100.0,
                'behavior': None,
                'change_num_iterations': 5,
                'loss_history_subgroup': 'all',
                'mode': None,
                'rate': 0.002,
                'sharpness': 20.0,
                'slope': 1.0,
                'threshold': 5.0,
                'update_loss_history': False},
 'initialization': {'alphabet_init': {'dist': 'uniform', 'range': 3.14159},
                    'alphabet_seed': None,
                    'angle_shift': [0.0, 0.0, 0.0],
                    'dropout_seed': None,
                    'evolutionary_multiplier': 1.0,
                    'graph_seed': 254,
                    'queue_seed': None,
                    'recurrent_forget_bias': 1.0,
                    'recurrent_init': {'base': {'dist': 'uniform',
                                                'range': 0.01},
                                       'bias': {'dist': 'uniform',
                                                'range': 0}},
                    'recurrent_nonlinear_out_proj_init': {'base': {},
                                                          'bias': {}},
                    'recurrent_nonlinear_out_proj_seed': None,
                    'recurrent_out_proj_init': {'base': {'dist': 'uniform',
                                                         'range': 0.01},
                                                'bias': {'dist': 'uniform',
                                                         'range': 0}},
                    'recurrent_out_proj_seed': None,
                    'recurrent_seed': None,
                    'zoneout_seed': None},
 'io': {'alphabet_file': None,
        'checkpoint_every_n_hours': 24,
        'checkpoints_directory': 'RGN7/runs/CASP7/ProteinNet7Thinning90/checkpoints/',
        'data_files': None,
        'data_files_glob': 'RGN7/data/ProteinNet7Thinning90/training/[!a-z]*',
        'detailed_logs': False,
        'evaluation_sub_groups': ['10', '20', '30', '40', '50', '70', '90'],
        'log_alphabet': False,
        'log_model_summaries': True,
        'logs_directory': 'RGN7/runs/CASP7/ProteinNet7Thinning90/logs/',
        'max_checkpoints': None,
        'name': 'training',
        'num_edge_residues': 0,
        'num_evo_entries': 42},
 'loss': {'atoms': 'c_alpha',
          'batch_dependent_normalization': True,
          'include': True,
          'tertiary_normalization': 'first',
          'tertiary_weight': 1.0},
 'optimization': {'alphabet_temperature': 1.0,
                  'batch_size': 32,
                  'beta1': 0.95,
                  'beta2': 0.99,
                  'decay': 0.9,
                  'epsilon': 1e-07,
                  'gradient_threshold': 5.0,
                  'initial_accumulator_value': 0.1,
                  'learning_rate': 0.0001,
                  'momentum': 0.0,
                  'num_epochs': 100000,
                  'num_steps': 700,
                  'optimizer': 'adam',
                  'recurrent_threshold': None,
                  'rescale_behavior': 'norm_rescaling'},
 'queueing': {'batch_queue_capacity': 10000,
              'bucket_boundaries': None,
              'file_queue_capacity': 1000,
              'min_after_dequeue': 500,
              'num_evaluation_invocations': 1,
              'shuffle': True},
 'regularization': {'alphabet_keep_probability': 1.0,
                    'alphabet_normalization': None,
                    'recurrent_input_keep_probability': [0.5, 0.5],
                    'recurrent_keep_probability': 1.0,
                    'recurrent_layer_normalization': False,
                    'recurrent_memory_zonein_probability': 1.0,
                    'recurrent_nonlinear_out_proj_normalization': None,
                    'recurrent_output_keep_probability': 1.0,
                    'recurrent_state_zonein_probability': 1.0,
                    'recurrent_variational_dropout': False}}



*** weighted validation evaluation configuration ***
{'architecture': {'all_to_all_peepholes': False,
                  'all_to_recurrent_skip_connections': False,
                  'alphabet_size': 60,
                  'alphabet_trainable': True,
                  'bidirectional': True,
                  'first_residual_connection_from_nth_layer': 1,
                  'higher_order_layers': True,
                  'include_dihedrals_between_layers': False,
                  'include_evolutionary': True,
                  'include_primary': True,
                  'include_recurrent_outputs_between_layers': True,
                  'input_to_recurrent_skip_connections': False,
                  'recurrent_layer_size': [800, 800],
                  'recurrent_nonlinear_out_proj_function': 'tanh',
                  'recurrent_nonlinear_out_proj_size': None,
                  'recurrent_peepholes': True,
                  'recurrent_to_output_skip_connections': False,
                  'recurrent_unit': 'CudnnLSTM',
                  'residual_connections_every_n_layers': None,
                  'tertiary_output': 'linear_alphabet'},
 'computing': {'allow_gpu_growth': False,
               'default_device': '',
               'fill_gpu': False,
               'functions_on_devices': {'/cpu:0': ['point_to_coordinate']},
               'gpu_fraction': 1.0,
               'num_cpus': 4,
               'num_reconstruction_fragments': 6,
               'num_reconstruction_parallel_iters': 4,
               'num_recurrent_parallel_iters': 1,
               'num_recurrent_shards': 1},
 'curriculum': {'base': 100.0,
                'behavior': None,
                'change_num_iterations': 5,
                'loss_history_subgroup': 'all',
                'mode': None,
                'rate': 0.002,
                'sharpness': 20.0,
                'slope': 1.0,
                'threshold': 5.0,
                'update_loss_history': True},
 'initialization': {'alphabet_init': {'dist': 'uniform', 'range': 3.14159},
                    'alphabet_seed': None,
                    'angle_shift': [0.0, 0.0, 0.0],
                    'dropout_seed': None,
                    'evolutionary_multiplier': 1.0,
                    'graph_seed': 254,
                    'queue_seed': None,
                    'recurrent_forget_bias': 1.0,
                    'recurrent_init': {'base': {'dist': 'uniform',
                                                'range': 0.01},
                                       'bias': {'dist': 'uniform',
                                                'range': 0}},
                    'recurrent_nonlinear_out_proj_init': {'base': {},
                                                          'bias': {}},
                    'recurrent_nonlinear_out_proj_seed': None,
                    'recurrent_out_proj_init': {'base': {'dist': 'uniform',
                                                         'range': 0.01},
                                                'bias': {'dist': 'uniform',
                                                         'range': 0}},
                    'recurrent_out_proj_seed': None,
                    'recurrent_seed': None,
                    'zoneout_seed': None},
 'io': {'alphabet_file': None,
        'checkpoint_every_n_hours': 24,
        'checkpoints_directory': None,
        'data_files': None,
        'data_files_glob': 'RGN7/data/ProteinNet7Thinning90/validation/1',
        'detailed_logs': False,
        'evaluation_sub_groups': ['10', '20', '30', '40', '50', '70', '90'],
        'log_alphabet': False,
        'log_model_summaries': True,
        'logs_directory': None,
        'max_checkpoints': None,
        'name': 'evaluation_wt_validation',
        'num_edge_residues': 0,
        'num_evo_entries': 42},
 'loss': {'atoms': 'c_alpha',
          'batch_dependent_normalization': True,
          'include': False,
          'tertiary_normalization': 'first',
          'tertiary_weight': 1.0},
 'optimization': {'alphabet_temperature': 1.0,
                  'batch_size': 1,
                  'beta1': 0.95,
                  'beta2': 0.99,
                  'decay': 0.9,
                  'epsilon': 1e-07,
                  'gradient_threshold': 5.0,
                  'initial_accumulator_value': 0.1,
                  'learning_rate': 0.0001,
                  'momentum': 0.0,
                  'num_epochs': 1,
                  'num_steps': 700,
                  'optimizer': 'adam',
                  'recurrent_threshold': None,
                  'rescale_behavior': 'norm_rescaling'},
 'queueing': {'batch_queue_capacity': 300,
              'bucket_boundaries': None,
              'file_queue_capacity': 10,
              'min_after_dequeue': 10,
              'num_evaluation_invocations': 1,
              'shuffle': False},
 'regularization': {'alphabet_keep_probability': 1.0,
                    'alphabet_normalization': None,
                    'recurrent_input_keep_probability': [0.5, 0.5],
                    'recurrent_keep_probability': 1.0,
                    'recurrent_layer_normalization': False,
                    'recurrent_memory_zonein_probability': 1.0,
                    'recurrent_nonlinear_out_proj_normalization': None,
                    'recurrent_output_keep_probability': 1.0,
                    'recurrent_state_zonein_probability': 1.0,
                    'recurrent_variational_dropout': False}}
2018-10-28 20:59:04.218884: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX
Traceback (most recent call last):
  File "../rgn/model/protling.py", line 532, in <module>
    while loop(args): pass
  File "../rgn/model/protling.py", line 384, in loop
    session = models['training'].start(models.values())
  File "~/end2end/rgn/model/model.py", line 448, in _start
    self._saver.restore(session, latest_checkpoint)
  File "~/end2end/miniconda2.7/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1574, in restore
    err, "a mismatch between the current graph and the graph")
tensorflow.python.framework.errors_impl.InvalidArgumentError: Restoring from checkpoint failed. This is most likely due to a mismatch between the current graph and the graph from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

No OpKernel was registered to support Op 'CudnnRNNCanonicalToParams' with these attrs.  Registered devices: [CPU], Registered kernels:
  <no registered kernels>

	 [[{{node RGN/training/layer0/fw/cudnn_lstm/cudnn_lstm/CudnnRNNCanonicalToParams}} = CudnnRNNCanonicalToParams[T=DT_FLOAT, direction="unidirectional", dropout=0, input_mode="linear_input", num_params=8, rnn_mode="lstm", seed=254, seed2=4497](RGN/training/layer0/fw/cudnn_lstm/cudnn_lstm/CudnnRNNCanonicalToParams/num_layers, RGN/training/layer0/fw/cudnn_lstm/cudnn_lstm/CudnnRNNCanonicalToParams/num_units, RGN/training/layer0/fw/cudnn_lstm/cudnn_lstm/CudnnRNNCanonicalToParams/input_size, RGN/training/layer0/fw/cudnn_lstm/cudnn_lstm/random_uniform, RGN/training/layer0/fw/cudnn_lstm/cudnn_lstm/random_uniform_1, RGN/training/layer0/fw/cudnn_lstm/cudnn_lstm/random_uniform_2, RGN/training/layer0/fw/cudnn_lstm/cudnn_lstm/random_uniform_3, RGN/training/layer0/fw/cudnn_lstm/cudnn_lstm/random_uniform_4, RGN/training/layer0/fw/cudnn_lstm/cudnn_lstm/random_uniform_5, RGN/training/layer0/fw/cudnn_lstm/cudnn_lstm/random_uniform_6, RGN/training/layer0/fw/cudnn_lstm/cudnn_lstm/random_uniform_7, RGN/training/layer0/fw/cudnn_lstm/cudnn_lstm/random_uniform_8, RGN/training/layer0/fw/cudnn_lstm/cudnn_lstm/random_uniform_9, RGN/training/layer0/fw/cudnn_lstm/cudnn_lstm/random_uniform_10, RGN/training/layer0/fw/cudnn_lstm/cudnn_lstm/random_uniform_11, RGN/training/layer0/fw/cudnn_lstm/cudnn_lstm/random_uniform_12, RGN/training/layer0/fw/cudnn_lstm/cudnn_lstm/random_uniform_13, RGN/training/layer0/fw/cudnn_lstm/cudnn_lstm/random_uniform_14, RGN/training/layer0/fw/cudnn_lstm/cudnn_lstm/random_uniform_15)]]

Caused by op u'RGN/training/layer0/fw/cudnn_lstm/cudnn_lstm/CudnnRNNCanonicalToParams', defined at:
  File "../rgn/model/protling.py", line 532, in <module>
    while loop(args): pass
  File "../rgn/model/protling.py", line 306, in loop
    models.update({'training': RGNModel('training', configs['training'])})
  File "~/end2end/rgn/model/model.py", line 114, in __init__
    self._create_graph(mode, self.config)
  File "~/end2end/rgn/model/model.py", line 200, in _create_graph
    recurrent_outputs, recurrent_states = _higher_recurrence(mode, recurrence_config, inputs, num_stepss, alphabet=alphabet)
  File "~/end2end/rgn/model/model.py", line 693, in _higher_recurrence
    layer_recurrent_outputs, layer_recurrent_states = _recurrence(mode, layer_config, layer_inputs, num_stepss)
  File "~/end2end/rgn/model/model.py", line 787, in _recurrence
    outputs_directed, (_, states_directed) = rnn(inputs_directed, training=is_training)
  File "~/end2end/miniconda2.7/lib/python2.7/site-packages/tensorflow/python/layers/base.py", line 364, in __call__
    outputs = super(Layer, self).__call__(inputs, *args, **kwargs)
  File "~/end2end/miniconda2.7/lib/python2.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 759, in __call__
    self.build(input_shapes)
  File "~/end2end/miniconda2.7/lib/python2.7/site-packages/tensorflow/contrib/cudnn_rnn/python/layers/cudnn_rnn.py", line 352, in build
    opaque_params_t = self._canonical_to_opaque(weights, biases)
  File "~/end2end/miniconda2.7/lib/python2.7/site-packages/tensorflow/contrib/cudnn_rnn/python/layers/cudnn_rnn.py", line 474, in _canonical_to_opaque
    direction=self._direction)
  File "~/end2end/miniconda2.7/lib/python2.7/site-packages/tensorflow/contrib/cudnn_rnn/python/ops/cudnn_rnn_ops.py", line 1251, in cudnn_rnn_canonical_to_opaque_params
    name=name)
  File "~/end2end/miniconda2.7/lib/python2.7/site-packages/tensorflow/python/ops/gen_cudnn_rnn_ops.py", line 642, in cudnn_rnn_canonical_to_params
    name=name)
  File "~/end2end/miniconda2.7/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "~/end2end/miniconda2.7/lib/python2.7/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
    return func(*args, **kwargs)
  File "~/end2end/miniconda2.7/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 3272, in create_op
    op_def=op_def)
  File "~/end2end/miniconda2.7/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1768, in __init__
    self._traceback = tf_stack.extract_stack()

InvalidArgumentError (see above for traceback): Restoring from checkpoint failed. This is most likely due to a mismatch between the current graph and the graph from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

No OpKernel was registered to support Op 'CudnnRNNCanonicalToParams' with these attrs.  Registered devices: [CPU], Registered kernels:
  <no registered kernels>

	 [[{{node RGN/training/layer0/fw/cudnn_lstm/cudnn_lstm/CudnnRNNCanonicalToParams}} = CudnnRNNCanonicalToParams[T=DT_FLOAT, direction="unidirectional", dropout=0, input_mode="linear_input", num_params=8, rnn_mode="lstm", seed=254, seed2=4497](RGN/training/layer0/fw/cudnn_lstm/cudnn_lstm/CudnnRNNCanonicalToParams/num_layers, RGN/training/layer0/fw/cudnn_lstm/cudnn_lstm/CudnnRNNCanonicalToParams/num_units, RGN/training/layer0/fw/cudnn_lstm/cudnn_lstm/CudnnRNNCanonicalToParams/input_size, RGN/training/layer0/fw/cudnn_lstm/cudnn_lstm/random_uniform, RGN/training/layer0/fw/cudnn_lstm/cudnn_lstm/random_uniform_1, RGN/training/layer0/fw/cudnn_lstm/cudnn_lstm/random_uniform_2, RGN/training/layer0/fw/cudnn_lstm/cudnn_lstm/random_uniform_3, RGN/training/layer0/fw/cudnn_lstm/cudnn_lstm/random_uniform_4, RGN/training/layer0/fw/cudnn_lstm/cudnn_lstm/random_uniform_5, RGN/training/layer0/fw/cudnn_lstm/cudnn_lstm/random_uniform_6, RGN/training/layer0/fw/cudnn_lstm/cudnn_lstm/random_uniform_7, RGN/training/layer0/fw/cudnn_lstm/cudnn_lstm/random_uniform_8, RGN/training/layer0/fw/cudnn_lstm/cudnn_lstm/random_uniform_9, RGN/training/layer0/fw/cudnn_lstm/cudnn_lstm/random_uniform_10, RGN/training/layer0/fw/cudnn_lstm/cudnn_lstm/random_uniform_11, RGN/training/layer0/fw/cudnn_lstm/cudnn_lstm/random_uniform_12, RGN/training/layer0/fw/cudnn_lstm/cudnn_lstm/random_uniform_13, RGN/training/layer0/fw/cudnn_lstm/cudnn_lstm/random_uniform_14, RGN/training/layer0/fw/cudnn_lstm/cudnn_lstm/random_uniform_15)]]

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 18 (9 by maintainers)

Most upvoted comments

That’s unlikely to have worked. What do the logs say? Output should be in base/runs/runName/datasetName/…

I had the same issue and solved it by explicitly specifying the -g argument as 0. However, after the code runs to completion, where are the output files generated about the prediction?