tensorflow: ValueError: Arguments and signature arguments do not match -- when using dataset api, keras functional api and checkpoints callback (tf2.0)
System information
- Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): MacOS 10.13.6
- Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: NA
- TensorFlow installed from (source or binary): binary
- TensorFlow version (use command below): GIT_VERSION=‘v1.12.1-4759-g9856697d8b’ TF_VERSION=‘2.0.0-dev20190622’
- Python version: 3.6.4
- Bazel version (if compiling from source): NA
- GCC/Compiler version (if compiling from source): NA
- CUDA/cuDNN version: NA
- GPU model and memory: NA
Describe the current behavior
Calling the fit function on a Keras model, when specifying a Dataset and a ModelCheckpoint callback, will crash after the first epoch with this error:
ValueError: Arguments and signature arguments do not match.
The error happens only when specifying both the training Dataset and validation Dataset.
The error happens because of the checkpoint callback.
Describe the expected behavior The model should not crash, continue training and successfully save the checkpoints.
Code to reproduce the issue
import tensorflow as tf
# model architecture
inputs = tf.keras.Input(shape=(784,), name='flattened_image')
x = tf.keras.layers.Dense(64, activation='relu')(inputs)
x = tf.keras.layers.Dense(64, activation='relu')(x)
outputs = tf.keras.layers.Dense(10, activation='softmax', name='predictions')(x)
model = tf.keras.Model(inputs=inputs, outputs=outputs, name='error_showcase')
# loading mnist data
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
x_train = x_train.reshape(60000, 784).astype('float32') / 255
x_test = x_test.reshape(10000, 784).astype('float32') / 255
# create the training dataset
train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
# shuffle, batch and prefetch
train_dataset = train_dataset.shuffle(buffer_size=1024).batch(64).prefetch(1024)
# create the validation dataset.
val_dataset = tf.data.Dataset.from_tensor_slices((x_test, y_test))
# shuffle, batch and prefetch
val_dataset = val_dataset.batch(64).prefetch(1024)
# compile the model
model.compile(
loss='sparse_categorical_crossentropy',
optimizer='rmsprop',
metrics=['accuracy']
)
# defining checkpoint callback
checkpoint_callback = tf.keras.callbacks.ModelCheckpoint(
'./checkpoints/',
monitor='val_accuracy',
verbose=1,
save_best_only=True,
mode='max'
)
# fit the model
history = model.fit(
train_dataset,
validation_data=val_dataset,
epochs=5,
callbacks=[checkpoint_callback],
)
print('\nhistory dict:', history.history)
Other info / logs
2019-06-22 18:30:17.760500: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-06-22 18:30:17.777440: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7f8d2bb0de30 executing computations on platform Host. Devices:
2019-06-22 18:30:17.777463: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): <undefined>, <undefined>
Epoch 1/5
WARNING: Logging before flag parsing goes to stderr.
W0622 18:30:18.333158 140736272085888 deprecation.py:323] From /temp/v36/lib/python3.6/site-packages/tensorflow_core/python/ops/math_grad.py:1251: add_dispatch_support.<locals>.wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
W0622 18:30:18.374418 140736272085888 deprecation.py:323] From /temp/v36/lib/python3.6/site-packages/tensorflow_core/python/keras/optimizer_v2/optimizer_v2.py:460: BaseResourceVariable.constraint (from tensorflow.python.ops.resource_variable_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Apply a constraint manually following the optimizer update step.
2019-06-22 18:30:18.583083: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1541] (One-time warning): Not using XLA:CPU for cluster because envvar TF_XLA_FLAGS=--tf_xla_cpu_global_jit was not set. If you want XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU. To confirm that XLA is active, pass --vmodule=xla_compilation_cache=1 (as a proper command-line flag, not via TF_XLA_FLAGS) or set the envvar XLA_FLAGS=--xla_hlo_profile.
911/938 [============================>.] - ETA: 0s - loss: 0.3021 - accuracy: 0.9133
Epoch 00001: val_accuracy improved from -inf to 0.94660, saving model to ./checkpoints/
2019-06-22 18:30:20.643797: W tensorflow/python/util/util.cc:268] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.
W0622 18:30:20.653870 140736272085888 deprecation.py:506] From /temp/v36/lib/python3.6/site-packages/tensorflow_core/python/ops/resource_variable_ops.py:1775: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
938/938 [==============================] - 2s 3ms/step - loss: 0.2972 - accuracy: 0.9147 - val_loss: 0.1739 - val_accuracy: 0.9466
Epoch 2/5
Traceback (most recent call last):
File "error_showcase_ckpt.py", line 46, in <module>
callbacks=[checkpoint_callback],
File "/temp/v36/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/training.py", line 669, in fit
use_multiprocessing=use_multiprocessing)
File "/temp/v36/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/training_generator.py", line 695, in fit
steps_name='steps_per_epoch')
File "/temp/v36/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/training_generator.py", line 265, in model_iteration
batch_outs = batch_function(*batch_data)
File "/temp/v36/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/training.py", line 939, in train_on_batch
outputs = self.train_function(ins) # pylint: disable=not-callable
File "/temp/v36/lib/python3.6/site-packages/tensorflow_core/python/keras/backend.py", line 3483, in __call__
outputs = self._graph_fn(*converted_inputs)
File "/temp/v36/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py", line 583, in __call__
return self._call_flat(args, self.captured_inputs)
File "/temp/v36/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py", line 685, in _call_flat
outputs = self._inference_function.call(ctx, args)
File "/temp/v36/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py", line 436, in call
(len(args), len(list(self.signature.input_arg))))
ValueError: Arguments and signature arguments do not match: 19 20
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 8
- Comments: 19 (4 by maintainers)
Hi @mahzoon, I got the same error as yours. But it only happens when I set
save_weights_only=Falsein the checkpoint callback. If and only if I setsave_weights_only=True, it will work as usual.Also, I’m confused by this warning “W tensorflow/python/util/util.cc:280] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.” It didn’t show up after I set
save_weights_only=True.@mahzoon thanks for reporting the issue - we were able to repro and found that the mismatched argument is keras_learning_phase. We’re actively working on training loop refactoring which will resolve the issue. Will post back here once we get more updates.
I got similar issue, I was using
K.function(input, output, updates)get the same issue, without specifying a ModelCheckpoint callback
Traceback (most recent call last): File "/home/deeplearning/.vscode/extensions/ms-python.python-2019.6.22090/pythonFiles/ptvsd_launcher.py", line 43, in <module> main(ptvsdArgs) File "/home/deeplearning/.vscode/extensions/ms-python.python-2019.6.22090/pythonFiles/lib/python/ptvsd/__main__.py", line 434, in main run() File "/home/deeplearning/.vscode/extensions/ms-python.python-2019.6.22090/pythonFiles/lib/python/ptvsd/__main__.py", line 312, in run_file runpy.run_path(target, run_name='__main__') File "/usr/local/lib/python3.6/runpy.py", line 263, in run_path pkg_name=pkg_name, script_name=fname) File "/usr/local/lib/python3.6/runpy.py", line 96, in _run_module_code mod_name, mod_spec, pkg_name, script_name) File "/usr/local/lib/python3.6/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/deeplearning/work/Deeplearning/TensorFlow/DeepWritingID/DeepHWS_online_2/run.py", line 62, in <module> sys.exit(main()) File "/home/deeplearning/work/Deeplearning/TensorFlow/DeepWritingID/DeepHWS_online_2/run.py", line 56, in main train_model.fit(next(it), batch_size=params.batch_size, epochs=3) File "/usr/local/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 643, in fit use_multiprocessing=use_multiprocessing) File "/usr/local/lib/python3.6/site-packages/tensorflow/python/keras/engine/training_arrays.py", line 664, in fit steps_name='steps_per_epoch') File "/usr/local/lib/python3.6/site-packages/tensorflow/python/keras/engine/training_arrays.py", line 383, in model_iteration batch_outs = f(ins_batch) File "/usr/local/lib/python3.6/site-packages/tensorflow/python/keras/backend.py", line 3510, in __call__ outputs = self._graph_fn(*converted_inputs) File "/usr/local/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 572, in __call__ return self._call_flat(args) File "/usr/local/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 671, in _call_flat outputs = self._inference_function.call(ctx, args) File "/usr/local/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 427, in call (len(args), len(list(self.signature.input_arg)))) ValueError: Arguments and signature arguments do not match: 97 98I also hit this issue in my personal project 😃