tensorflow: Error when saving weights in h5 format for layer with nested layers

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): yes
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 18.04.4
  • TensorFlow installed from (source or binary): binary
  • TensorFlow version (use command below): 2.2.0
  • Python version: 3.7

Describe the current behavior Given a layer with nested layers, when trying to save weights in h5 format, fails with RuntimeError: Unable to create link (name already exists)

Standalone code to reproduce the issue

import tensorflow as tf

from tensorflow.keras import datasets, layers, models
(train_images, train_labels), (test_images, test_labels) = datasets.cifar10.load_data()

class NestedLayers(tf.keras.layers.Layer):
    def __init__(self):
        super(NestedLayers, self).__init__()
        self.units = [layers.Conv2D(16, (3,3), name="conv_2d_0"),
                      layers.Conv2D(16, (3,3), name="conv_2d_1"),
                      layers.Conv2D(16, (3,3), name="conv_2d_2")]

    def build(self, input_shape):
        for i in range(0,2):
            unit_input_shape = list(input_shape)
            unit_input_shape[-1] = 1
            unit = self.units[i]
            unit.build(unit_input_shape)

    def call(self, inputs):
        split_inputs = tf.split(value=inputs,
                                 num_or_size_splits=3,
                                 axis=-1,
                                 name="conv_grp_split")
        outputs = []
        for i in range(0,2):
            out = self.units[i](split_inputs[i])
            outputs.append(out)
        out = tf.keras.layers.concatenate(outputs, axis=-1, name="conv_grp_concat")
        return  out

# Normalize pixel values to be between 0 and 1
train_images, test_images = train_images / 255.0, test_images / 255.0

class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer',
               'dog', 'frog', 'horse', 'ship', 'truck']

model = models.Sequential()
model.add(layers.InputLayer(input_shape=(32, 32, 3)))
model.add(NestedLayers())
model.add(layers.Flatten())
model.add(layers.Dense(10, activation='softmax'))
model.summary()

import datetime
log_dir = "logs/fit/" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S")

check_pt = tf.keras.callbacks.ModelCheckpoint(
                os.path.join(log_dir, "model.ckpt.{epoch:04d}-{val_loss:.06f}.hdf5"),
                monitor='val_loss',
                verbose=1,
                save_best_only=False,
                save_weights_only=True,
                mode='max',
                period=1)

model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

history = model.fit(train_images, train_labels, epochs=2,
                    validation_data=(test_images, test_labels),
                    callbacks = [check_pt])

Fails with the following: Epoch 00001: saving model to logs/fit/20200702-114230/model.ckpt.0001-1.949398.hdf5 Traceback (most recent call last): File “/nfs/site/home/tkrimer/work/mbe/dlo/src/internal_utils/train_cifar10_tf_2/nested_layers_save.py”, line 78, in <module> callbacks = [tensorboard_callback, check_pt]) File “/localdrive/users/tkrimer/venv/tf_2.1/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py”, line 66, in _method_wrapper return method(self, *args, **kwargs) File “/localdrive/users/tkrimer/venv/tf_2.1/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py”, line 876, in fit callbacks.on_epoch_end(epoch, epoch_logs) File “/localdrive/users/tkrimer/venv/tf_2.1/lib/python3.7/site-packages/tensorflow/python/keras/callbacks.py”, line 365, in on_epoch_end callback.on_epoch_end(epoch, logs) File “/localdrive/users/tkrimer/venv/tf_2.1/lib/python3.7/site-packages/tensorflow/python/keras/callbacks.py”, line 1177, in on_epoch_end self._save_model(epoch=epoch, logs=logs) File “/localdrive/users/tkrimer/venv/tf_2.1/lib/python3.7/site-packages/tensorflow/python/keras/callbacks.py”, line 1223, in _save_model self.model.save_weights(filepath, overwrite=True) File “/localdrive/users/tkrimer/venv/tf_2.1/lib/python3.7/site-packages/tensorflow/python/keras/engine/network.py”, line 1151, in save_weights hdf5_format.save_weights_to_hdf5_group(f, self.layers) File “/localdrive/users/tkrimer/venv/tf_2.1/lib/python3.7/site-packages/tensorflow/python/keras/saving/hdf5_format.py”, line 639, in save_weights_to_hdf5_group param_dset = g.create_dataset(name, val.shape, dtype=val.dtype) File “/localdrive/users/tkrimer/venv/tf_2.1/lib/python3.7/site-packages/h5py/_hl/group.py”, line 139, in create_dataset self[name] = dset File “/localdrive/users/tkrimer/venv/tf_2.1/lib/python3.7/site-packages/h5py/_hl/group.py”, line 373, in setitem h5o.link(obj.id, self.id, name, lcpl=lcpl, lapl=self._lapl) File “h5py/_objects.pyx”, line 54, in h5py._objects.with_phil.wrapper File “h5py/_objects.pyx”, line 55, in h5py._objects.with_phil.wrapper File “h5py/h5o.pyx”, line 202, in h5py.h5o.link RuntimeError: Unable to create link (name already exists)

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 1
  • Comments: 28 (4 by maintainers)

Commits related to this issue

Most upvoted comments

Similar bug still exists in TensorFlow 2.4.0.

I found a workaround: Forcing the internal layers weights to be named uniquely with a name scope. Check this out:

class NestedLayers(tf.keras.layers.Layer):
    def __init__(self):
        super(NestedLayers, self).__init__()
        self.units = [layers.Conv2D(16, (3,3), name="conv_2d_0"),
                      layers.Conv2D(16, (3,3), name="conv_2d_1"),
                      layers.Conv2D(16, (3,3), name="conv_2d_2")]

    def build(self, input_shape):
        for i in range(0,2):
            unit_input_shape = list(input_shape)
            unit_input_shape[-1] = 1
            unit = self.units[i]
            with tf.name_scope("BUILD_{}".format(i)):
               unit.build(unit_input_shape)

    def call(self, inputs):
        split_inputs = tf.split(value=inputs,
                                 num_or_size_splits=3,
                                 axis=-1,
                                 name="conv_grp_split")
        outputs = []
        for i in range(0,2):
            out = self.units[i](split_inputs[i])
            outputs.append(out)
        out = tf.keras.layers.concatenate(outputs, axis=-1, name="conv_grp_concat")
        return  out

Also, here is a gist

The same error can be obtained this way:

  1. save an Adam optimizer in a Python variable (such that the same instance can be used again later) and use it to compile a Keras model
  2. call fit() on the model
  3. instantiate a Keras model (or load it from a .h5 file), and compile it using the same instance of Adam optimizer as before, the instance saved in a variable at point 1)
  4. call fit() on the second model, the one instantiated at point 3)
  5. save the second model as a .h5 file, you get the error RuntimeError: Unable to create link (name already exists)

What happens is that after step 4), the Adam optimizer instance has in its weights attribute a list of 9 TF variables, instead of 5. Four of the variables have been duplicated, keeping the same respective name. When trying to save the mode in .h5 format, the tf.keras.models.save_model() function finds two variables with duplicated name Adam/dense/kernel/m:0, and stops with an error.

In this case, the error can be resolved by ensuring the same instance of Adam optimizer is not used to compile more than one model.

I also had this same issue when creating a model from an existing one the following way

denoise_model_name = "./data_model_1_00000220.h5"
denoise_model = tf.keras.models.load_model(denoise_model_name)

keras_input = tf.keras.Input(shape=(44, 100, 3))
x = denoise_model(keras_input)
x = layers.Conv2D(64, kernel_size=(3, 3),
                      activation='relu',
                      kernel_initializer='he_uniform', padding='same')(x)

And adding a name to the new Conv2D layer also solved it