tensorflow: KeyError: 'Failed to format this callback filepath: "checkpoint_5000/checkpoint_{epoch:02d}_{batch:04d}". Reason: \'batch\''

Please make sure that this is a bug. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:bug_template

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Windows 10
  • Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device:
  • TensorFlow installed from (source or binary): Installed using Pip
  • TensorFlow version (use command below): 2.2.0
  • Python version: 3.7
  • Bazel version (if compiling from source):
  • GCC/Compiler version (if compiling from source):
  • CUDA/cuDNN version:
 nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Fri_Feb__8_19:08:26_Pacific_Standard_Time_2019
Cuda compilation tools, release 10.1, V10.1.105
  • GPU model and memory: NVIDIA MX110 2GB

You can collect some of this information using our environment capture script You can also obtain the TensorFlow version with:

  1. TF 1.0: python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)"
  2. TF 2.0: python -c "import tensorflow as tf; print(tf.version.GIT_VERSION, tf.version.VERSION)" v2.2.0-rc4-8-g2b96f3662b 2.2.0

Describe the current behavior I am following a tutorial where we can save model weights in Tensorflow. We are saving weights every 5000 training points. Code of instructor and my code is same. But his version is 2.0, and my version is 2.2.0. There is error so I guess it is a bug in version. Describe the expected behavior It should save the model weights every 5k training points. Standalone code to reproduce the issue

import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten, Conv2D, MaxPooling2D

from tensorflow.keras.callbacks import ModelCheckpoint



(x_train, y_train), (x_test, y_test) = tf.keras.datasets.cifar10.load_data()
x_train = x_train / 255.0
x_test = x_test / 255.0

def get_new_model():
    model = Sequential([
        Conv2D(filters=16, input_shape=(32, 32, 3), kernel_size=(3, 3), 
               activation='relu', name='conv_1'),
        tf.keras.layers.BatchNormalization(),
        Conv2D(filters=8, kernel_size=(3, 3), activation='relu', name='conv_2'),
        MaxPooling2D(pool_size=(4, 4), name='pool_1'),
        tf.keras.layers.BatchNormalization(),
        Conv2D(filters=8, kernel_size=(3, 3), activation='relu', name='conv_3'),
        MaxPooling2D(pool_size=(4, 4), name='pool_2'),
        Flatten(name='flatten'),
        Dense(units=32, activation='relu', name='dense_1'),
        tf.keras.layers.Dropout(0.5),
        Dense(units=10, activation='softmax', name='dense_2')
    ])
    model.compile(optimizer='adam',
                  loss='sparse_categorical_crossentropy',
                  metrics=['accuracy'])
    return model


checkpoint_5000_path = 'checkpoint_5000/checkpoint_{epoch:02d}_{batch:04d}'

model = get_new_model()
checkpoint_5000 = ModelCheckpoint(filepath=checkpoint_5000_path, verbose=True, save_weights_only=True,
                                  save_freq=5000)
model.fit(x_train, y_train, batch_size=10, validation_data=(x_test,y_test), epochs=3, verbose= True, callbacks=[checkpoint_5000])

Provide a reproducible test case that is the bare minimum necessary to generate the problem. If possible, please share a link to Colab/Jupyter/any notebook.

Other info / logs Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached. Full traceback is


---------------------------------------------------------------------------

KeyError                                  Traceback (most recent call last)

C:\Anaconda\envs\myenv\lib\site-packages\tensorflow\python\keras\callbacks.py in _get_file_path(self, epoch, logs)
   1243         # placeholders can cause formatting to fail.
-> 1244         return self.filepath.format(epoch=epoch + 1, **logs)
   1245       except KeyError as e:

KeyError: 'batch'


During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)

<ipython-input-11-cc68dad1ac2c> in <module>
      7 checkpoint_5000 = ModelCheckpoint(filepath=checkpoint_5000_path, verbose=True, save_weights_only=True,
      8                                   save_freq=5000)
----> 9 model.fit(x_train, y_train, batch_size=10, validation_data=(x_test,y_test), epochs=3, verbose= True, callbacks=[checkpoint_5000])
     10 
     11 

C:\Anaconda\envs\myenv\lib\site-packages\tensorflow\python\keras\engine\training.py in _method_wrapper(self, *args, **kwargs)
     64   def _method_wrapper(self, *args, **kwargs):
     65     if not self._in_multi_worker_mode():  # pylint: disable=protected-access
---> 66       return method(self, *args, **kwargs)
     67 
     68     # Running inside `run_distribute_coordinator` already.

C:\Anaconda\envs\myenv\lib\site-packages\tensorflow\python\keras\engine\training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_batch_size, validation_freq, max_queue_size, workers, use_multiprocessing)
    853                 context.async_wait()
    854               logs = tmp_logs  # No error, now safe to assign to logs.
--> 855               callbacks.on_train_batch_end(step, logs)
    856         epoch_logs = copy.copy(logs)
    857 

C:\Anaconda\envs\myenv\lib\site-packages\tensorflow\python\keras\callbacks.py in on_train_batch_end(self, batch, logs)
    388     if self._should_call_train_batch_hooks:
    389       logs = self._process_logs(logs)
--> 390       self._call_batch_hook(ModeKeys.TRAIN, 'end', batch, logs=logs)
    391 
    392   def on_test_batch_begin(self, batch, logs=None):

C:\Anaconda\envs\myenv\lib\site-packages\tensorflow\python\keras\callbacks.py in _call_batch_hook(self, mode, hook, batch, logs)
    296     for callback in self.callbacks:
    297       batch_hook = getattr(callback, hook_name)
--> 298       batch_hook(batch, logs)
    299     self._delta_ts[hook_name].append(time.time() - t_before_callbacks)
    300 

C:\Anaconda\envs\myenv\lib\site-packages\tensorflow\python\keras\callbacks.py in on_train_batch_end(self, batch, logs)
    613     """
    614     # For backwards compatibility.
--> 615     self.on_batch_end(batch, logs=logs)
    616 
    617   @doc_controls.for_subclass_implementers

C:\Anaconda\envs\myenv\lib\site-packages\tensorflow\python\keras\callbacks.py in on_batch_end(self, batch, logs)
   1160       self._batches_seen_since_last_saving += 1
   1161       if self._batches_seen_since_last_saving >= self.save_freq:
-> 1162         self._save_model(epoch=self._current_epoch, logs=logs)
   1163         self._batches_seen_since_last_saving = 0
   1164 

C:\Anaconda\envs\myenv\lib\site-packages\tensorflow\python\keras\callbacks.py in _save_model(self, epoch, logs)
   1194                   int) or self.epochs_since_last_save >= self.period:
   1195       self.epochs_since_last_save = 0
-> 1196       filepath = self._get_file_path(epoch, logs)
   1197 
   1198       try:

C:\Anaconda\envs\myenv\lib\site-packages\tensorflow\python\keras\callbacks.py in _get_file_path(self, epoch, logs)
   1245       except KeyError as e:
   1246         raise KeyError('Failed to format this callback filepath: "{}". '
-> 1247                        'Reason: {}'.format(self.filepath, e))
   1248     else:
   1249       # If this is multi-worker training, and this worker should not

KeyError: 'Failed to format this callback filepath: "checkpoint_5000/checkpoint_{epoch:02d}_{batch:04d}". Reason: \'batch\''

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 2
  • Comments: 15 (4 by maintainers)

Most upvoted comments

Here is the solution: just replace save_freq=5000 To save_freq=‘epoch’

welcome 👍