tensorflow: Model with custom metrics broken if saved and reloaded

There is a new problem in r2.4 (not present in 2.3.1). After saving and reloading a model with custom metric the model is broken. The next training will not work. Here is my minimum code to easily reproduce:

import numpy as np
from tensorflow.keras.models import load_model, Sequential
from tensorflow.keras.layers import Flatten, Dense
from tensorflow.keras.optimizers import Adam

def cmetrics(y_true, y_pred):
	return(0)

model = Sequential()
model.add(Dense(10,activation="relu", input_shape=(331, 331, 3)))
model.add(Flatten())
model.add(Dense(10, activation='sigmoid'))
model.compile(loss='binary_crossentropy',
	optimizer=Adam(),
	metrics=[cmetrics])
model.summary()
xdata = np.random.rand(100,331,331,3)
ydata = np.random.rand(100,10)
history = model.fit(x=xdata, y=ydata)
model.save('test.h5', save_format='h5')
model = load_model('test.h5', custom_objects={'cmetrics': cmetrics,})
history = model.fit(x=xdata,y=ydata)

When running with tensorflow 2.3.1, I get the expected result:

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense (Dense)                (None, 331, 331, 10)      40        
_________________________________________________________________
flatten (Flatten)            (None, 1095610)           0         
_________________________________________________________________
dense_1 (Dense)              (None, 10)                10956110  
=================================================================
Total params: 10,956,150
Trainable params: 10,956,150
Non-trainable params: 0
_________________________________________________________________
4/4 [==============================] - 1s 149ms/step - loss: 52.2964 - cmetrics: 0.0000e+00
4/4 [==============================] - 1s 142ms/step - loss: 46.6724 - cmetrics: 0.0000e+00

When running with tensorflow 2.4.0, I get this:

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense (Dense)                (None, 331, 331, 10)      40        
_________________________________________________________________
flatten (Flatten)            (None, 1095610)           0         
_________________________________________________________________
dense_1 (Dense)              (None, 10)                10956110  
=================================================================
Total params: 10,956,150
Trainable params: 10,956,150
Non-trainable params: 0
_________________________________________________________________
2020-12-21 13:30:33.688730: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
2020-12-21 13:30:33.708412: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 2593735000 Hz
4/4 [==============================] - 1s 190ms/step - loss: 18.9856 - cmetrics: 0.0000e+00
Traceback (most recent call last):
  File "modeltest.py", line 22, in <module>
    history = model.fit(x=xdata,y=ydata)
  File "/home/ludger/safe/sources/python/test/env/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py", line 1100, in fit
    tmp_logs = self.train_function(iterator)
  File "/home/ludger/safe/sources/python/test/env/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 828, in __call__
    result = self._call(*args, **kwds)
  File "/home/ludger/safe/sources/python/test/env/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 871, in _call
    self._initialize(args, kwds, add_initializers_to=initializers)
  File "/home/ludger/safe/sources/python/test/env/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 726, in _initialize
    *args, **kwds))
  File "/home/ludger/safe/sources/python/test/env/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 2969, in _get_concrete_function_internal_garbage_collected
    graph_function, _ = self._maybe_define_function(args, kwargs)
  File "/home/ludger/safe/sources/python/test/env/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 3361, in _maybe_define_function
    graph_function = self._create_graph_function(args, kwargs)
  File "/home/ludger/safe/sources/python/test/env/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 3206, in _create_graph_function
    capture_by_value=self._capture_by_value),
  File "/home/ludger/safe/sources/python/test/env/lib/python3.7/site-packages/tensorflow/python/framework/func_graph.py", line 990, in func_graph_from_py_func
    func_outputs = python_func(*func_args, **func_kwargs)
  File "/home/ludger/safe/sources/python/test/env/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 634, in wrapped_fn
    out = weak_wrapped_fn().__wrapped__(*args, **kwds)
  File "/home/ludger/safe/sources/python/test/env/lib/python3.7/site-packages/tensorflow/python/framework/func_graph.py", line 977, in wrapper
    raise e.ag_error_metadata.to_exception(e)
TypeError: in user code:

    /home/ludger/safe/sources/python/test/env/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py:805 train_function  *
        return step_function(self, iterator)
    /home/ludger/safe/sources/python/test/env/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py:795 step_function  **
        outputs = model.distribute_strategy.run(run_step, args=(data,))
    /home/ludger/safe/sources/python/test/env/lib/python3.7/site-packages/tensorflow/python/distribute/distribute_lib.py:1259 run
        return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs)
    /home/ludger/safe/sources/python/test/env/lib/python3.7/site-packages/tensorflow/python/distribute/distribute_lib.py:2730 call_for_each_replica
        return self._call_for_each_replica(fn, args, kwargs)
    /home/ludger/safe/sources/python/test/env/lib/python3.7/site-packages/tensorflow/python/distribute/distribute_lib.py:3417 _call_for_each_replica
        return fn(*args, **kwargs)
    /home/ludger/safe/sources/python/test/env/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py:790 run_step  **
        with ops.control_dependencies(_minimum_control_deps(outputs)):
    /home/ludger/safe/sources/python/test/env/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py:2793 _minimum_control_deps
        outputs = nest.flatten(outputs, expand_composites=True)
    /home/ludger/safe/sources/python/test/env/lib/python3.7/site-packages/tensorflow/python/util/nest.py:341 flatten
        return _pywrap_utils.Flatten(structure, expand_composites)

    TypeError: '<' not supported between instances of 'function' and 'str'

Additional Information:

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Debian 10
TensorFlow installed from (source or binary): pip install tensorflow
TensorFlow version (use command below): 2.4.0
Python version: 3.7

About this issue

Original URL
State: closed
Created 4 years ago
Comments: 15 (7 by maintainers)

Most upvoted comments

FYI, what I did to escape this issue is recompiling a model with saved loss and optimizer.

model = load_model("mymodel_best.h5", custom_objects={"mymetric":mymetric}, compile=True)
model.compile(loss=model.loss, optimizer=model.optimizer, metrics=[mymetric])

kazu41 on Mar 23, 2021

Hi all–

Sorry forgot to update this!

This issue is now fixed with https://github.com/tensorflow/tensorflow/commit/6bd24c2096fd0f89301b4a6e1f4e8375324e0469, which unfortunately did not make it into 2.5.0 but is now in tf-nightly.

The gist that @amahendrakar provided now works with tf-nightly.

monicadsong on Apr 14, 2021

@alecgunny I think it does. See this colab for example. @byronyi is there a solution for someone who wants to reuse the optimizer’s state?

zaccharieramzi on Feb 15, 2021

Will recompiling the model reset the optimizer state?

alecgunny on Feb 1, 2021

Try compile your model with exact same metric again after loading it. See https://github.com/keras-team/keras/issues/14231

byronyi on Jan 10, 2021