tensorflow: WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (0.112000). Check your callbacks.

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): yes
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Windows 10 Pro x64
  • Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: No
  • TensorFlow installed from (source or binary): source
  • TensorFlow version (use command below): master branch, commit 99e754b3a189eefab15fdbf326115d312e44fc7b
  • Python version: 3.7
  • Bazel version (if compiling from source): the default downloaded by Bazelisk
  • GCC/Compiler version (if compiling from source): MSVS2019 v16.5.0
  • CUDA/cuDNN version: CUDA 10.2, cuDNN 7.6.5
  • GPU model and memory: Geforce GTX 1080, 8GB or Geforce RTX 2080, 8GB (same problem)

Describe the current behavior on_train_batch_end 2/3 of time is spent in method on_train_batch_end without any user callbacks (i.e. it seems it spends time in TensorFlow internals)

Describe the expected behavior on_train_batch_end doesn’t take a majority of training time if no user callbacks that do heavy ops in it are specified.

Standalone code to reproduce the issue You need at least 64GB of RAM to execute this:

import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

num_inputs=1440*3
num_outputs=2

N = 1000*1000
train_X = np.random.randn(N, num_inputs)
train_y = np.random.randn(N, num_outputs)

tf.keras.backend.set_floatx('float32')
model = keras.Sequential()
n_hidden = 2*num_inputs

activation1 = tf.nn.relu
model.add(layers.Dense(n_hidden, activation=activation1, input_shape=(num_inputs,)))
for i in range(4):
    model.add(layers.Dense(n_hidden, activation=activation1))
model.add(layers.Dense(num_outputs))

model.compile(optimizer=tf.keras.optimizers.RMSprop(1e-3),
              loss=tf.keras.losses.mse,
              metrics=(tf.keras.metrics.mse,))

model.fit(train_X, train_y, epochs=1000, batch_size=512, verbose=2)

Other info / logs

D:\Programs\Anaconda\python.exe "C:\Program Files\JetBrains\PyCharm 2019.3.3\plugins\python\helpers\pydev\pydevconsole.py" --mode=client --port=1305
import sys; print('Python %s on %s' % (sys.version, sys.platform))
sys.path.extend(['D:\\Dev\\Views\\Trading-git\\CudaNN\\src\\Anaconda', 'D:/Dev/Views/Trading-git/CudaNN/src/Anaconda'])
Python 3.7.4 (default, Aug  9 2019, 18:34:13) [MSC v.1915 64 bit (AMD64)]
Type 'copyright', 'credits' or 'license' for more information
IPython 7.8.0 -- An enhanced Interactive Python. Type '?' for help.
PyDev console: using IPython 7.8.0
Python 3.7.4 (default, Aug  9 2019, 18:34:13) [MSC v.1915 64 bit (AMD64)] on win32
runfile('D:/Dev/Views/Trading-git/CudaNN/src/Anaconda/Reproduction01.py', wdir='D:/Dev/Views/Trading-git/CudaNN/src/Anaconda')
2020-03-24 22:40:00.449504: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_102.dll
2020-03-24 22:41:58.323992: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll
2020-03-24 22:41:58.347918: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1544] Found device 0 with properties: 
pciBusID: 0000:0b:00.0 name: GeForce GTX 1080 computeCapability: 6.1
coreClock: 1.8855GHz coreCount: 20 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 298.32GiB/s
2020-03-24 22:41:58.348257: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_102.dll
2020-03-24 22:41:58.352623: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-03-24 22:41:58.356230: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll
2020-03-24 22:41:58.357648: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll
2020-03-24 22:41:58.361747: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_10.dll
2020-03-24 22:41:58.364018: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_10.dll
2020-03-24 22:41:58.372291: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-03-24 22:41:58.373318: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1686] Adding visible gpu devices: 0
2020-03-24 22:41:58.373720: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with Intel(R) MKL-DNN to use the following CPU instructions in performance-critical operations:  AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2020-03-24 22:41:58.382851: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x1d1a0667d60 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-03-24 22:41:58.383116: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-03-24 22:41:58.383724: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1544] Found device 0 with properties: 
pciBusID: 0000:0b:00.0 name: GeForce GTX 1080 computeCapability: 6.1
coreClock: 1.8855GHz coreCount: 20 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 298.32GiB/s
2020-03-24 22:41:58.384041: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_102.dll
2020-03-24 22:41:58.384254: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-03-24 22:41:58.384441: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll
2020-03-24 22:41:58.384612: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll
2020-03-24 22:41:58.384788: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_10.dll
2020-03-24 22:41:58.384962: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_10.dll
2020-03-24 22:41:58.385136: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-03-24 22:41:58.385759: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1686] Adding visible gpu devices: 0
2020-03-24 22:41:58.938950: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1085] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-03-24 22:41:58.939152: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1091]      0 
2020-03-24 22:41:58.939263: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1104] 0:   N 
2020-03-24 22:41:58.939991: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1230] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6278 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080, pci bus id: 0000:0b:00.0, compute capability: 6.1)
2020-03-24 22:41:58.943050: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x1d1c48bade0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-03-24 22:41:58.943286: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GeForce GTX 1080, Compute Capability 6.1
Epoch 1/1000
2020-03-24 22:42:13.278214: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (0.188999). Check your callbacks.
WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (0.198997). Check your callbacks.
WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (0.199499). Check your callbacks.
WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (0.200001). Check your callbacks.
WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (0.199999). Check your callbacks.
WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (0.200000). Check your callbacks.
WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (0.200000). Check your callbacks.
WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (0.200000). Check your callbacks.
WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (0.200000). Check your callbacks.

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 6
  • Comments: 44 (4 by maintainers)

Most upvoted comments

This is caused when other operations which runs at the end of each batch consumes longer time than batch themselves. One of the causes might be, we have really smaller batches. So, any operations is slow compared to your original batches.

Increasing batch size actually solved the problem for me.

see @bstriner comment in Link

setting verbose=2 gets rid of the warning

I was having the same warning

Loading model
Epoch 1/3
   2/1688 [..............................] - ETA: 6:36 - loss: 2.2862 - accuracy: 0.1562WARNING:tensorflow:Callbacks method `on_train_batch_end` is slow compared to the batch time (batch time: 0.0027s vs `on_train_batch_end` time: 0.4677s). Check your callbacks.
1688/1688 [==============================] - 5s 3ms/step - loss: 0.4661 - accuracy: 0.8339 - val_loss: 0.3578 - val_accuracy: 0.8735
Epoch 2/3
1688/1688 [==============================] - 4s 2ms/step - loss: 0.3157 - accuracy: 0.8861 - val_loss: 0.2872 - val_accuracy: 0.8923
Epoch 3/3
1688/1688 [==============================] - 4s 2ms/step - loss: 0.2734 - accuracy: 0.8991 - val_loss: 0.2711 - val_accuracy: 0.9012

I could remove the warning by adding profile_batch=0 (to disable profiling) to the Tensorboard callback. Now my call back looks like this,

tbCallBack = TensorBoard(log_dir='logs', update_freq='epoch', write_graph=True, profile_batch=0)

I am using Tensorflow 2.3.0.

Loading model
Epoch 1/3
1688/1688 [==============================] - 4s 2ms/step - loss: 0.4849 - accuracy: 0.8240 - val_loss: 0.3673 - val_accuracy: 0.8658
Epoch 2/3
1688/1688 [==============================] - 4s 2ms/step - loss: 0.3282 - accuracy: 0.8810 - val_loss: 0.3016 - val_accuracy: 0.8902
Epoch 3/3
1688/1688 [==============================] - 4s 2ms/step - loss: 0.2792 - accuracy: 0.8978 - val_loss: 0.2671 - val_accuracy: 0.9022

and below lines removes other warnings.

import os
# To disable all logging output from TensorFlow
os.environ["TF_CPP_MIN_LOG_LEVEL"]="3" 

According to https://www.tensorflow.org/api_docs/python/tf/keras/Model#fit :

verbose: 0, 1, or 2. Verbosity mode. 0 = silent, 1 = progress bar, 2 = one line per epoch. Note that the progress bar is not particularly useful when logged to a file, so verbose=2 is recommended when not running interactively (eg, in a production environment).

So verbose=2 seems to be not that verbose to trigger such a warning: it prints something on epoch end, which is rarer than the batch end. In contrast, they suggest on the internet to change from verbose=1 to verbose=2 in order to avoid such warning, because I guess verbose=1 prints something on every batch.

This may be due to the fact that you’ve given verbose=2 in your fit model. You can try verbose = 1.

@amahendrakar , thanks for your suggestions about suppressing the logs. However, the point missed in this approach is that excessive output from Tensorflow seems to indicate a bug in it. As you can see in the diagram I attached, it looks like a lot of time is spent on a side activity.

Based on this identical discussion in keras-team/keras, this can be solved by setting steps_per_execution=N (where N is a high, positive int) in your model.compile() call. This will cause you to need to specify steps_per_epoch in model.fit(), however.

Hi Everyone,

After using it for a while, I found this is really not a bug. It is just for each training batch, keras is doing some callbacks. The warning shows up because the callback is taking longer than the training itself. It might be that you are doing some expensive ops. Note that other than the callbacks you defined by yourself, keras has some default callbacks, as shown here: https://github.com/tensorflow/tensorflow/blob/582c8d236cb079023657287c318ff26adb239002/tensorflow/python/keras/callbacks.py#L254-L271

I saw exactly the same warning. In my case, it was the problem of the Tensorboard callback. I used batch size of 32 and probably, Tensorboard’s callback took longer for saving and writing log files than an iteration of 32 examples. I had removed Tensorboard from my callbacks’ list and the Warning has been disappeared.

For me the issue also has nothing to do with the callbacks (it persists after removing all callbacks). For me, the warning appears after I either increase my batch size from 32 to 64 or increase my image width from 32 to 64 pixels.

Hi. This problem occurred when I used custom callbacks that took a long time in <= 2.2. I solved this problem by hacking a small part of CallbackList. The warning message is due to this.

class CallbackList(object):
..
..
    def _call_batch_hook(self, mode, hook, batch, logs=None):
        """Helper function for all batch_{begin | end} methods."""
        if not self.callbacks:
            return
        hook_name = 'on_{mode}_batch_{hook}'.format(mode=mode, hook=hook)
        if hook == 'begin':
            self._t_enter_batch = time.time()
        if hook == 'end':
            # Batch is ending, calculate batch time.
            self._delta_t_batch = time.time() - self._t_enter_batch

        logs = logs or {}
        t_before_callbacks = time.time()
        for callback in self.callbacks:
            batch_hook = getattr(callback, hook_name)
            batch_hook(batch, logs)
        self._delta_ts[hook_name].append(time.time() - t_before_callbacks)

        delta_t_median = np.median(self._delta_ts[hook_name])
        if (self._delta_t_batch > 0. and
              delta_t_median > 0.95 * self._delta_t_batch and delta_t_median > 0.1):
            # here is the warning message!
            logging.warning(
                'Method (%s) is slow compared '
                'to the batch update (%f). Check your callbacks.', hook_name, 
                delta_t_median)
..
..

If you don’t need to check the time of your callbacks, you can delete the message and assign a new CallbackList to the original CallbackList as below.

from tensorflow.python.keras.callbacks import CallbackList
class CustomCallbackList(CallbackList):
    def _call_batch_hook(self, mode, hook, batch, logs=None):
        """Helper function for all batch_{begin | end} methods."""
        if not self.callbacks:
            return
        hook_name = 'on_{mode}_batch_{hook}'.format(mode=mode, hook=hook)

        logs = logs or {}
        for callback in self.callbacks:
            batch_hook = getattr(callback, hook_name)
            batch_hook(batch, logs)

tf.python.keras.callbacks.CallbackList = CustomCallbackList

Did this work for you? It’s giving me an error. AttributeError: module 'tensorflow' has no attribute 'python' Besides, I think this isn’t really solving the problem. It may eliminate the error message but the on_train_batch_end will be still taking longer than the learning itself. I have no callbacks and I want to fix this. Nice idea looking at the source code anyway.

If tensorflow is in the environment used, the error you mentioned does not appear. You can check it in my example. Of course, it just removes the message as you said. If you already know that on_train_batch_end has heavy operations, it is very utilizable. Otherwise, check your train_step loop whether any problem exists.

Pardon, but what do you mean by ‘If tensorflow is in the environment used’?

About the original problem of this thread… I’m getting the error even without any heavy operations in the callbacks. Can a problem inside a train_step loop cause this problem? I looked at the source code and I believe the order of the issue is like below. However, the code seems it’s just because of the callbacks’ on_train_batch_end.

https://github.com/keras-team/keras/blob/master/keras/engine/training.py#L1118 https://github.com/tensorflow/tensorflow/blob/r2.2/tensorflow/python/keras/callbacks.py#L381 https://github.com/tensorflow/tensorflow/blob/r2.2/tensorflow/python/keras/callbacks.py#L304

To be more specific with my issue, I don’t get the error if I don’t use any callbacks, but the total elapsed time in one epoch is the same. Also, the error appears only at first and doesn’t appear anymore afterward. Thank you.

The reason I mentioned train_step loop is that operations in train_step may not work properly. As you said, this message due to delta_t_median > 0.95 * self._delta_t_batch and delta_t_median > 0.1 is only printed when you have callbacks. If this message came out only once, it was just a message under that condition. I think the warning was issued because it took some time to initialize, such as on_epoch_begin or on_train_batch_begin. There are no other problems.

Hi. This problem occurred when I used custom callbacks that took a long time in <= 2.2. I solved this problem by hacking a small part of CallbackList.

The warning message is due to this.

class CallbackList(object):
..
..
    def _call_batch_hook(self, mode, hook, batch, logs=None):
        """Helper function for all batch_{begin | end} methods."""
        if not self.callbacks:
            return
        hook_name = 'on_{mode}_batch_{hook}'.format(mode=mode, hook=hook)
        if hook == 'begin':
            self._t_enter_batch = time.time()
        if hook == 'end':
            # Batch is ending, calculate batch time.
            self._delta_t_batch = time.time() - self._t_enter_batch

        logs = logs or {}
        t_before_callbacks = time.time()
        for callback in self.callbacks:
            batch_hook = getattr(callback, hook_name)
            batch_hook(batch, logs)
        self._delta_ts[hook_name].append(time.time() - t_before_callbacks)

        delta_t_median = np.median(self._delta_ts[hook_name])
        if (self._delta_t_batch > 0. and
              delta_t_median > 0.95 * self._delta_t_batch and delta_t_median > 0.1):
            # here is the warning message!
            logging.warning(
                'Method (%s) is slow compared '
                'to the batch update (%f). Check your callbacks.', hook_name, 
                delta_t_median)
..
..

If you don’t need to check the time of your callbacks, you can delete the message and assign a new CallbackList to the original CallbackList as below.

from tensorflow.python.keras.callbacks import CallbackList
class CustomCallbackList(CallbackList):
    def _call_batch_hook(self, mode, hook, batch, logs=None):
        """Helper function for all batch_{begin | end} methods."""
        if not self.callbacks:
            return
        hook_name = 'on_{mode}_batch_{hook}'.format(mode=mode, hook=hook)

        logs = logs or {}
        for callback in self.callbacks:
            batch_hook = getattr(callback, hook_name)
            batch_hook(batch, logs)

tf.python.keras.callbacks.CallbackList = CustomCallbackList

Running tf2.3rc2 on TPU and I am facing the same issue as well.

@srogatch, The verbose argument lets you control what output you see on each epoch. For more information please check this StackOverflow comment.

Regarding the warning logs, could you please try changing the logging level and check if it works?

import os
os.environ["TF_CPP_MIN_LOG_LEVEL"] = "2"
import tensorflow as tf

Thanks!