tensorflow: Error when retraining a saved LSTM model

Please make sure that this is a bug. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:bug_template

System information

Have I written custom code (as opposed to using a stock example script provided in TensorFlow): I have a simple and functional custom code to reproduce the issue.
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 18.04
TensorFlow installed from (source or binary): conda install tensorflow-gpu=2.0.0
TensorFlow version (use command below):

python -c “import tensorflow as tf; print(tf.version.GIT_VERSION, tf.version.VERSION)”

v2.0.0-rc2-26-g64c3d38 2.0.0

Python version: Python 3.7.4
CUDA/cuDNN version: CUDA Version: 10.1 #define CUDNN_MAJOR 7
GPU model and memory: nvidia Quadro GV100 - 32478MiB

You can collect some of this information using our environment capture script

output env_info.sh attached.

Describe the current behavior

Train a simple LSTM model using tf.keras API
save the model to file as SavedModel format. model.save('saved_model', save_format='tf')
In a separate script (without access to the model definition), load the saved model: model = tf.keras.models.load_model('saved_model1')
continue training the reloaded model: model.fit(data_x,data_y,batch_size=64,epochs=2) The following error encountered:

LookupError: No gradient defined for operation 'while' (op type: While)

Describe the expected behavior

expect the LSTM SavedModel can be loaded and retrained in a separated script without access to the model definition.

Code to reproduce the issue Script to train and save the model:

import tensorflow as tf
import numpy as np

def Model_Functional_API():

    inputs = tf.keras.Input(shape=(3, 2))
    encoder = tf.keras.layers.LSTM(10,return_sequences=True)
    encoder_outputs = encoder(inputs)
    projection_layer = tf.keras.layers.Dense(2)
    preds = projection_layer(encoder_outputs)
    model = tf.keras.Model(inputs,preds)

    return model

def Model_Sequence():

    model = tf.keras.Sequential()
    model.add(tf.keras.layers.LSTM(10,return_sequences=True))
    model.add(tf.keras.layers.Dense(2))

    return model

# model = Model_Functional_API()
model = Model_Sequence()
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
            loss='mean_squared_error')

data_x = np.random.random([64,3,2])
data_y = np.random.random([64,3,2])

model.fit(data_x,data_y,batch_size=64,epochs=2)

model.save('saved_model', save_format='tf')
# model.save('saved_model.h5')

Script to load and retrain the model (this is where the error encountered):

import tensorflow as tf
import numpy as np

model = tf.keras.models.load_model('saved_model')
# model = tf.keras.models.load_model('saved_model.h5')

data_x = np.random.random([64,3,2])
data_y = np.random.random([64,3,2])

model.fit(data_x,data_y,batch_size=64,epochs=2)

Other info / logs Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.

tf_env.txt and the ouput error message are attached. error_log.txt tf_env.txt

About this issue

Original URL
State: closed
Created 5 years ago
Reactions: 1
Comments: 17 (7 by maintainers)

Most upvoted comments

Hi, I am having the same issue while trying to load a model with multiple inputs. model

I save the whole model with a ModelCheckpoint callback

cp_callback = tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_path,
                                                  save_weights_only=False,
                                                  verbose=0,
                                                  save_freq=4542*10)

In a new colab session I try to load it with

checkpoint_dir = os.path.dirname(checkpoint_path)
model = tf.keras.models.load_model(checkpoint_dir)

The trace:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/execute.py in make_shape(v, arg_name)
    210   try:
--> 211     shape = tensor_shape.as_shape(v)
    212   except TypeError as e:

22 frames
TypeError: Dimension value must be integer or None or have an __index__ method, got TensorShape([None, 2048])

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/execute.py in make_shape(v, arg_name)
    211     shape = tensor_shape.as_shape(v)
    212   except TypeError as e:
--> 213     raise TypeError("Error converting %s to a TensorShape: %s." % (arg_name, e))
    214   except ValueError as e:
    215     raise ValueError("Error converting %s to a TensorShape: %s." % (arg_name,

TypeError: Error converting shape to a TensorShape: Dimension value must be integer or None or have an __index__ method, got TensorShape([None, 2048]).

pabloi09 on Apr 28, 2020