tensorflow: ValueError: Arguments and signature arguments do not match -- when using dataset api, keras functional api and checkpoints callback (tf2.0)

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): MacOS 10.13.6
  • Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: NA
  • TensorFlow installed from (source or binary): binary
  • TensorFlow version (use command below): GIT_VERSION=‘v1.12.1-4759-g9856697d8b’ TF_VERSION=‘2.0.0-dev20190622’
  • Python version: 3.6.4
  • Bazel version (if compiling from source): NA
  • GCC/Compiler version (if compiling from source): NA
  • CUDA/cuDNN version: NA
  • GPU model and memory: NA

Describe the current behavior Calling the fit function on a Keras model, when specifying a Dataset and a ModelCheckpoint callback, will crash after the first epoch with this error: ValueError: Arguments and signature arguments do not match. The error happens only when specifying both the training Dataset and validation Dataset. The error happens because of the checkpoint callback.

Describe the expected behavior The model should not crash, continue training and successfully save the checkpoints.

Code to reproduce the issue


import tensorflow as tf

# model architecture
inputs = tf.keras.Input(shape=(784,), name='flattened_image')
x = tf.keras.layers.Dense(64, activation='relu')(inputs)
x = tf.keras.layers.Dense(64, activation='relu')(x)
outputs = tf.keras.layers.Dense(10, activation='softmax', name='predictions')(x)
model = tf.keras.Model(inputs=inputs, outputs=outputs, name='error_showcase')

# loading mnist data
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
x_train = x_train.reshape(60000, 784).astype('float32') / 255
x_test = x_test.reshape(10000, 784).astype('float32') / 255

# create the training dataset
train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
# shuffle, batch and prefetch
train_dataset = train_dataset.shuffle(buffer_size=1024).batch(64).prefetch(1024)
# create the validation dataset.
val_dataset = tf.data.Dataset.from_tensor_slices((x_test, y_test))
# shuffle, batch and prefetch
val_dataset = val_dataset.batch(64).prefetch(1024)

# compile the model
model.compile(
    loss='sparse_categorical_crossentropy',
    optimizer='rmsprop',
    metrics=['accuracy']
)

# defining checkpoint callback
checkpoint_callback = tf.keras.callbacks.ModelCheckpoint(
    './checkpoints/',
    monitor='val_accuracy',
    verbose=1,
    save_best_only=True,
    mode='max'
)

# fit the model
history = model.fit(
    train_dataset,
    validation_data=val_dataset,
    epochs=5,
    callbacks=[checkpoint_callback],
)

print('\nhistory dict:', history.history)

Other info / logs

2019-06-22 18:30:17.760500: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-06-22 18:30:17.777440: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7f8d2bb0de30 executing computations on platform Host. Devices:
2019-06-22 18:30:17.777463: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): <undefined>, <undefined>
Epoch 1/5
WARNING: Logging before flag parsing goes to stderr.
W0622 18:30:18.333158 140736272085888 deprecation.py:323] From /temp/v36/lib/python3.6/site-packages/tensorflow_core/python/ops/math_grad.py:1251: add_dispatch_support.<locals>.wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
W0622 18:30:18.374418 140736272085888 deprecation.py:323] From /temp/v36/lib/python3.6/site-packages/tensorflow_core/python/keras/optimizer_v2/optimizer_v2.py:460: BaseResourceVariable.constraint (from tensorflow.python.ops.resource_variable_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Apply a constraint manually following the optimizer update step.
2019-06-22 18:30:18.583083: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1541] (One-time warning): Not using XLA:CPU for cluster because envvar TF_XLA_FLAGS=--tf_xla_cpu_global_jit was not set.  If you want XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU.  To confirm that XLA is active, pass --vmodule=xla_compilation_cache=1 (as a proper command-line flag, not via TF_XLA_FLAGS) or set the envvar XLA_FLAGS=--xla_hlo_profile.
911/938 [============================>.] - ETA: 0s - loss: 0.3021 - accuracy: 0.9133  
Epoch 00001: val_accuracy improved from -inf to 0.94660, saving model to ./checkpoints/
2019-06-22 18:30:20.643797: W tensorflow/python/util/util.cc:268] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.
W0622 18:30:20.653870 140736272085888 deprecation.py:506] From /temp/v36/lib/python3.6/site-packages/tensorflow_core/python/ops/resource_variable_ops.py:1775: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
938/938 [==============================] - 2s 3ms/step - loss: 0.2972 - accuracy: 0.9147 - val_loss: 0.1739 - val_accuracy: 0.9466
Epoch 2/5
Traceback (most recent call last):
  File "error_showcase_ckpt.py", line 46, in <module>
    callbacks=[checkpoint_callback],
  File "/temp/v36/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/training.py", line 669, in fit
    use_multiprocessing=use_multiprocessing)
  File "/temp/v36/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/training_generator.py", line 695, in fit
    steps_name='steps_per_epoch')
  File "/temp/v36/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/training_generator.py", line 265, in model_iteration
    batch_outs = batch_function(*batch_data)
  File "/temp/v36/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/training.py", line 939, in train_on_batch
    outputs = self.train_function(ins)  # pylint: disable=not-callable
  File "/temp/v36/lib/python3.6/site-packages/tensorflow_core/python/keras/backend.py", line 3483, in __call__
    outputs = self._graph_fn(*converted_inputs)
  File "/temp/v36/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py", line 583, in __call__
    return self._call_flat(args, self.captured_inputs)
  File "/temp/v36/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py", line 685, in _call_flat
    outputs = self._inference_function.call(ctx, args)
  File "/temp/v36/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py", line 436, in call
    (len(args), len(list(self.signature.input_arg))))
ValueError: Arguments and signature arguments do not match: 19 20 

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Reactions: 8
  • Comments: 19 (4 by maintainers)

Most upvoted comments

Hi @mahzoon, I got the same error as yours. But it only happens when I set save_weights_only=False in the checkpoint callback. If and only if I set save_weights_only=True, it will work as usual.

Also, I’m confused by this warning “W tensorflow/python/util/util.cc:280] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.” It didn’t show up after I set save_weights_only=True.

@mahzoon thanks for reporting the issue - we were able to repro and found that the mismatched argument is keras_learning_phase. We’re actively working on training loop refactoring which will resolve the issue. Will post back here once we get more updates.

tried tf-nightly build ‘2.1.0-dev20200109’ and the lastest TensorFlow-GPU also Keras 2.3.1

problem not resolved when doing predict or predict_on_batch from model loaded from a file.

ValueError: Arguments and signature arguments do not match. got: 92, expected: 93

can someone confirm this? or my problem could be unrelated.

I got similar issue, I was using K.function(input, output, updates)

get the same issue, without specifying a ModelCheckpoint callback

Traceback (most recent call last): File "/home/deeplearning/.vscode/extensions/ms-python.python-2019.6.22090/pythonFiles/ptvsd_launcher.py", line 43, in <module> main(ptvsdArgs) File "/home/deeplearning/.vscode/extensions/ms-python.python-2019.6.22090/pythonFiles/lib/python/ptvsd/__main__.py", line 434, in main run() File "/home/deeplearning/.vscode/extensions/ms-python.python-2019.6.22090/pythonFiles/lib/python/ptvsd/__main__.py", line 312, in run_file runpy.run_path(target, run_name='__main__') File "/usr/local/lib/python3.6/runpy.py", line 263, in run_path pkg_name=pkg_name, script_name=fname) File "/usr/local/lib/python3.6/runpy.py", line 96, in _run_module_code mod_name, mod_spec, pkg_name, script_name) File "/usr/local/lib/python3.6/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/deeplearning/work/Deeplearning/TensorFlow/DeepWritingID/DeepHWS_online_2/run.py", line 62, in <module> sys.exit(main()) File "/home/deeplearning/work/Deeplearning/TensorFlow/DeepWritingID/DeepHWS_online_2/run.py", line 56, in main train_model.fit(next(it), batch_size=params.batch_size, epochs=3) File "/usr/local/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 643, in fit use_multiprocessing=use_multiprocessing) File "/usr/local/lib/python3.6/site-packages/tensorflow/python/keras/engine/training_arrays.py", line 664, in fit steps_name='steps_per_epoch') File "/usr/local/lib/python3.6/site-packages/tensorflow/python/keras/engine/training_arrays.py", line 383, in model_iteration batch_outs = f(ins_batch) File "/usr/local/lib/python3.6/site-packages/tensorflow/python/keras/backend.py", line 3510, in __call__ outputs = self._graph_fn(*converted_inputs) File "/usr/local/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 572, in __call__ return self._call_flat(args) File "/usr/local/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 671, in _call_flat outputs = self._inference_function.call(ctx, args) File "/usr/local/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 427, in call (len(args), len(list(self.signature.input_arg)))) ValueError: Arguments and signature arguments do not match: 97 98

I also hit this issue in my personal project 😃