tensorflow: Model with custom metrics broken if saved and reloaded
There is a new problem in r2.4 (not present in 2.3.1). After saving and reloading a model with custom metric the model is broken. The next training will not work. Here is my minimum code to easily reproduce:
import numpy as np
from tensorflow.keras.models import load_model, Sequential
from tensorflow.keras.layers import Flatten, Dense
from tensorflow.keras.optimizers import Adam
def cmetrics(y_true, y_pred):
return(0)
model = Sequential()
model.add(Dense(10,activation="relu", input_shape=(331, 331, 3)))
model.add(Flatten())
model.add(Dense(10, activation='sigmoid'))
model.compile(loss='binary_crossentropy',
optimizer=Adam(),
metrics=[cmetrics])
model.summary()
xdata = np.random.rand(100,331,331,3)
ydata = np.random.rand(100,10)
history = model.fit(x=xdata, y=ydata)
model.save('test.h5', save_format='h5')
model = load_model('test.h5', custom_objects={'cmetrics': cmetrics,})
history = model.fit(x=xdata,y=ydata)
When running with tensorflow 2.3.1, I get the expected result:
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense (Dense) (None, 331, 331, 10) 40
_________________________________________________________________
flatten (Flatten) (None, 1095610) 0
_________________________________________________________________
dense_1 (Dense) (None, 10) 10956110
=================================================================
Total params: 10,956,150
Trainable params: 10,956,150
Non-trainable params: 0
_________________________________________________________________
4/4 [==============================] - 1s 149ms/step - loss: 52.2964 - cmetrics: 0.0000e+00
4/4 [==============================] - 1s 142ms/step - loss: 46.6724 - cmetrics: 0.0000e+00
When running with tensorflow 2.4.0, I get this:
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense (Dense) (None, 331, 331, 10) 40
_________________________________________________________________
flatten (Flatten) (None, 1095610) 0
_________________________________________________________________
dense_1 (Dense) (None, 10) 10956110
=================================================================
Total params: 10,956,150
Trainable params: 10,956,150
Non-trainable params: 0
_________________________________________________________________
2020-12-21 13:30:33.688730: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
2020-12-21 13:30:33.708412: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 2593735000 Hz
4/4 [==============================] - 1s 190ms/step - loss: 18.9856 - cmetrics: 0.0000e+00
Traceback (most recent call last):
File "modeltest.py", line 22, in <module>
history = model.fit(x=xdata,y=ydata)
File "/home/ludger/safe/sources/python/test/env/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py", line 1100, in fit
tmp_logs = self.train_function(iterator)
File "/home/ludger/safe/sources/python/test/env/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 828, in __call__
result = self._call(*args, **kwds)
File "/home/ludger/safe/sources/python/test/env/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 871, in _call
self._initialize(args, kwds, add_initializers_to=initializers)
File "/home/ludger/safe/sources/python/test/env/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 726, in _initialize
*args, **kwds))
File "/home/ludger/safe/sources/python/test/env/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 2969, in _get_concrete_function_internal_garbage_collected
graph_function, _ = self._maybe_define_function(args, kwargs)
File "/home/ludger/safe/sources/python/test/env/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 3361, in _maybe_define_function
graph_function = self._create_graph_function(args, kwargs)
File "/home/ludger/safe/sources/python/test/env/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 3206, in _create_graph_function
capture_by_value=self._capture_by_value),
File "/home/ludger/safe/sources/python/test/env/lib/python3.7/site-packages/tensorflow/python/framework/func_graph.py", line 990, in func_graph_from_py_func
func_outputs = python_func(*func_args, **func_kwargs)
File "/home/ludger/safe/sources/python/test/env/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 634, in wrapped_fn
out = weak_wrapped_fn().__wrapped__(*args, **kwds)
File "/home/ludger/safe/sources/python/test/env/lib/python3.7/site-packages/tensorflow/python/framework/func_graph.py", line 977, in wrapper
raise e.ag_error_metadata.to_exception(e)
TypeError: in user code:
/home/ludger/safe/sources/python/test/env/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py:805 train_function *
return step_function(self, iterator)
/home/ludger/safe/sources/python/test/env/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py:795 step_function **
outputs = model.distribute_strategy.run(run_step, args=(data,))
/home/ludger/safe/sources/python/test/env/lib/python3.7/site-packages/tensorflow/python/distribute/distribute_lib.py:1259 run
return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs)
/home/ludger/safe/sources/python/test/env/lib/python3.7/site-packages/tensorflow/python/distribute/distribute_lib.py:2730 call_for_each_replica
return self._call_for_each_replica(fn, args, kwargs)
/home/ludger/safe/sources/python/test/env/lib/python3.7/site-packages/tensorflow/python/distribute/distribute_lib.py:3417 _call_for_each_replica
return fn(*args, **kwargs)
/home/ludger/safe/sources/python/test/env/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py:790 run_step **
with ops.control_dependencies(_minimum_control_deps(outputs)):
/home/ludger/safe/sources/python/test/env/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py:2793 _minimum_control_deps
outputs = nest.flatten(outputs, expand_composites=True)
/home/ludger/safe/sources/python/test/env/lib/python3.7/site-packages/tensorflow/python/util/nest.py:341 flatten
return _pywrap_utils.Flatten(structure, expand_composites)
TypeError: '<' not supported between instances of 'function' and 'str'
Additional Information:
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Debian 10
- TensorFlow installed from (source or binary): pip install tensorflow
- TensorFlow version (use command below): 2.4.0
- Python version: 3.7
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 15 (7 by maintainers)
FYI, what I did to escape this issue is recompiling a model with saved loss and optimizer.
Hi all–
Sorry forgot to update this!
This issue is now fixed with https://github.com/tensorflow/tensorflow/commit/6bd24c2096fd0f89301b4a6e1f4e8375324e0469, which unfortunately did not make it into 2.5.0 but is now in tf-nightly.
The gist that @amahendrakar provided now works with tf-nightly.
@alecgunny I think it does. See this colab for example. @byronyi is there a solution for someone who wants to reuse the optimizer’s state?
Will recompiling the model reset the optimizer state?
Try compile your model with exact same metric again after loading it. See https://github.com/keras-team/keras/issues/14231