tensorflow: RuntimeError: Unable to create link (name already exists) during model saving with ModelCheckpoint
System information
- Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Arch Linux
- Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: No
- TensorFlow installed from (source or binary): TF 2.0 downloaded from repo
- TensorFlow version (use command below): tf-nightly-gpu-2.0-preview --> 2.0.0.dev20190314 tensorflow-hub --> 0.4.0
- Python version: 3.6
- CUDA/cuDNN version: CUDA Version 10.0.130/ cuDNN 7.5.0
- GPU model and memory: Nvidia RTX 2080 Ti 11GB (and GTX 1060 6GB)
Describe the current behavior
I’ve downloaded an inception model from TF-Hub (specifically this one: https://tfhub.dev/google/tf2-preview/inception_v3/feature_vector/2), I have added to it two Keras layers (a Dropout layer and a Dense layer) and during the training, I’m trying to save the model using the ModelCheckpoint Keras callback. Unfortunately, after one epoch and during the model saving I receive the following error:
Traceback (most recent call last):
File "/run/media/federico/XData/virtualenvs/python36_tf2preview/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 3291, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-2-3945e7fb8367>", line 1, in <module>
runfile('/run/media/federico/XData/PycharmProjectsXData/ash/ash/prova_gan_plain_test.py', wdir='/run/media/federico/XData/PycharmProjectsXData/ash/ash')
File "/opt/pycharm-professional/helpers/pydev/_pydev_bundle/pydev_umd.py", line 197, in runfile
pydev_imports.execfile(filename, global_vars, local_vars) # execute the script
File "/opt/pycharm-professional/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/run/media/federico/XData/PycharmProjectsXData/ash/ash/prova_gan_plain_test.py", line 98, in <module>
main()
File "/run/media/federico/XData/PycharmProjectsXData/ash/ash/prova_gan_plain_test.py", line 90, in main
logdir,
File "/run/media/federico/XData/PycharmProjectsXData/ash/ash/testers/gan_plain.py", line 85, in __init__
self._model = self._download_and_train_model()
File "/run/media/federico/XData/PycharmProjectsXData/ash/ash/testers/gan_plain.py", line 255, in _download_and_train_model
callbacks=[cback_checkpoint],
File "/run/media/federico/XData/virtualenvs/python36_tf2preview/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 1508, in fit_generator
steps_name='steps_per_epoch')
File "/run/media/federico/XData/virtualenvs/python36_tf2preview/lib/python3.6/site-packages/tensorflow/python/keras/engine/training_generator.py", line 324, in model_iteration
callbacks.on_epoch_end(epoch, epoch_logs)
File "/run/media/federico/XData/virtualenvs/python36_tf2preview/lib/python3.6/site-packages/tensorflow/python/keras/callbacks.py", line 290, in on_epoch_end
callback.on_epoch_end(epoch, logs)
File "/run/media/federico/XData/virtualenvs/python36_tf2preview/lib/python3.6/site-packages/tensorflow/python/keras/callbacks.py", line 892, in on_epoch_end
self.model.save_weights(filepath, overwrite=True)
File "/run/media/federico/XData/virtualenvs/python36_tf2preview/lib/python3.6/site-packages/tensorflow/python/keras/engine/network.py", line 1395, in save_weights
hdf5_format.save_weights_to_hdf5_group(f, self.layers)
File "/run/media/federico/XData/virtualenvs/python36_tf2preview/lib/python3.6/site-packages/tensorflow/python/keras/saving/hdf5_format.py", line 693, in save_weights_to_hdf5_group
param_dset = g.create_dataset(name, val.shape, dtype=val.dtype)
File "/run/media/federico/XData/virtualenvs/python36_tf2preview/lib/python3.6/site-packages/h5py/_hl/group.py", line 139, in create_dataset
self[name] = dset
File "/run/media/federico/XData/virtualenvs/python36_tf2preview/lib/python3.6/site-packages/h5py/_hl/group.py", line 371, in __setitem__
h5o.link(obj.id, self.id, name, lcpl=lcpl, lapl=self._lapl)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5o.pyx", line 202, in h5py.h5o.link
RuntimeError: Unable to create link (name already exists)
Highlighting the error: RuntimeError: Unable to create link (name already exists)
Describe the expected behavior
I’m expecting to be able to use the ModelCheckpoint callback to save the (best) model.
Alternatively, I’m also expecting to be able to save the model with the model.save(filepath) function.
Code to reproduce the issue You could reproduce the error on the following TF-Hub Colab page: https://colab.research.google.com/github/tensorflow/hub/blob/master/examples/colab/tf2_image_retraining.ipynb#scrollTo=CCpdfXPsh47Q
adding a cell with the ModelCheckpoint code
cback_checkpoint = tf.keras.callbacks.ModelCheckpoint(
filepath="best.h5",
verbose=1,
save_best_only=True,
)
and then adding the callback to the fit_generator function of the model:
steps_per_epoch = train_generator.samples // train_generator.batch_size
validation_steps = valid_generator.samples // valid_generator.batch_size
hist = model.fit_generator(
train_generator,
epochs=5, steps_per_epoch=steps_per_epoch,
validation_data=valid_generator,
validation_steps=validation_steps,
callbacks=[cback_checkpoint]).history
Other info / logs
I’ve found some issues online regarding a similar problem like this and issues: #5280, #6844 and the more recent #26811. Many of them speak about some problem regarding the naming of the layers (or weights) or about creating the ModelCheckpoint with save_best_only=True or save_weights_only=True. I have tried all the proposed approaches but without success.
Even with model.save(filepath) Keras function I face the same problem.
EDIT: Please follow this Colab link to have all the code set up to be reproduced.
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 22
- Comments: 26 (9 by maintainers)
The reason why it didn’t work was because I had the same namestrings for all the weights in my custom keras layers. Lesson: If you make keras layers, give them different name strings!
I also encountered the same problem. Note it only occurs with eager execution enabled. Tensorflow:1.13.1 Python:3.6.8 Anaconda Inc. IDE:Jupyter NoteBook
I was able to solve the .save error with the following snippet. Is this a good idea?
I got the same error in Tensorflow 2.1 when calling
save_weigths("model.h5", overwrite=True). None of the above mentioned fixes worked for me (I am not using hub and optimizer is passed asSGD(...)). Are there any other solutions to this?Update: Likely related to #26811 as I am using customer layer in which I call
tf.Variableinstead ofself.add_weight.For the error with eager execution, make sure you’re using OptimizerV2 from here: https://github.com/tensorflow/tensorflow/tree/master/tensorflow/python/keras/optimizer_v2. The optimizers in https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/keras/optimizers.py don’t inherit from Trackable so the variables aren’t named correctly.
You can also face this problem in case of no-unique ‘group’ part of name in weights in case of using multi-input models with transfer learning (for example - several MobileNets at input).
Wrote this code, it helped (summary, with changing other meta, such as model name and layer names):
That trick about
custom_objectsseem to work! I changed the loading code tomodel2 = tf.keras.models.load_model('mnist.h5', custom_objects=dict(adam=Adam))and it is able to load using Adam v2 optimizer. I hope that there is no problem with this approach so I would use it for now.Hmm, there appears to be a mismatch in the optimizer being loaded. Really, my advice would be to upgrade to at least TF1.14, but it might work passing in
custom_objects={'adam': Adam}as a keyword argument intotf.keras.models.load_model, whereAdamis the v2 version.Or putting this at the top of your program:
FYI - issue https://github.com/tensorflow/hub/issues/287 has been solved (upgrade Hub module version to /3).