tensorflow: WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (0.112000). Check your callbacks.
System information
- Have I written custom code (as opposed to using a stock example script provided in TensorFlow): yes
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Windows 10 Pro x64
- Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: No
- TensorFlow installed from (source or binary): source
- TensorFlow version (use command below): master branch, commit 99e754b3a189eefab15fdbf326115d312e44fc7b
- Python version: 3.7
- Bazel version (if compiling from source): the default downloaded by Bazelisk
- GCC/Compiler version (if compiling from source): MSVS2019 v16.5.0
- CUDA/cuDNN version: CUDA 10.2, cuDNN 7.6.5
- GPU model and memory: Geforce GTX 1080, 8GB or Geforce RTX 2080, 8GB (same problem)
Describe the current behavior
2/3 of time is spent in method
on_train_batch_end
without any user callbacks (i.e. it seems it spends time in TensorFlow internals)
Describe the expected behavior
on_train_batch_end
doesn’t take a majority of training time if no user callbacks that do heavy ops in it are specified.
Standalone code to reproduce the issue You need at least 64GB of RAM to execute this:
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
num_inputs=1440*3
num_outputs=2
N = 1000*1000
train_X = np.random.randn(N, num_inputs)
train_y = np.random.randn(N, num_outputs)
tf.keras.backend.set_floatx('float32')
model = keras.Sequential()
n_hidden = 2*num_inputs
activation1 = tf.nn.relu
model.add(layers.Dense(n_hidden, activation=activation1, input_shape=(num_inputs,)))
for i in range(4):
model.add(layers.Dense(n_hidden, activation=activation1))
model.add(layers.Dense(num_outputs))
model.compile(optimizer=tf.keras.optimizers.RMSprop(1e-3),
loss=tf.keras.losses.mse,
metrics=(tf.keras.metrics.mse,))
model.fit(train_X, train_y, epochs=1000, batch_size=512, verbose=2)
Other info / logs
D:\Programs\Anaconda\python.exe "C:\Program Files\JetBrains\PyCharm 2019.3.3\plugins\python\helpers\pydev\pydevconsole.py" --mode=client --port=1305
import sys; print('Python %s on %s' % (sys.version, sys.platform))
sys.path.extend(['D:\\Dev\\Views\\Trading-git\\CudaNN\\src\\Anaconda', 'D:/Dev/Views/Trading-git/CudaNN/src/Anaconda'])
Python 3.7.4 (default, Aug 9 2019, 18:34:13) [MSC v.1915 64 bit (AMD64)]
Type 'copyright', 'credits' or 'license' for more information
IPython 7.8.0 -- An enhanced Interactive Python. Type '?' for help.
PyDev console: using IPython 7.8.0
Python 3.7.4 (default, Aug 9 2019, 18:34:13) [MSC v.1915 64 bit (AMD64)] on win32
runfile('D:/Dev/Views/Trading-git/CudaNN/src/Anaconda/Reproduction01.py', wdir='D:/Dev/Views/Trading-git/CudaNN/src/Anaconda')
2020-03-24 22:40:00.449504: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_102.dll
2020-03-24 22:41:58.323992: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll
2020-03-24 22:41:58.347918: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1544] Found device 0 with properties:
pciBusID: 0000:0b:00.0 name: GeForce GTX 1080 computeCapability: 6.1
coreClock: 1.8855GHz coreCount: 20 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 298.32GiB/s
2020-03-24 22:41:58.348257: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_102.dll
2020-03-24 22:41:58.352623: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-03-24 22:41:58.356230: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll
2020-03-24 22:41:58.357648: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll
2020-03-24 22:41:58.361747: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_10.dll
2020-03-24 22:41:58.364018: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_10.dll
2020-03-24 22:41:58.372291: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-03-24 22:41:58.373318: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1686] Adding visible gpu devices: 0
2020-03-24 22:41:58.373720: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with Intel(R) MKL-DNN to use the following CPU instructions in performance-critical operations: AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2020-03-24 22:41:58.382851: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x1d1a0667d60 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-03-24 22:41:58.383116: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2020-03-24 22:41:58.383724: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1544] Found device 0 with properties:
pciBusID: 0000:0b:00.0 name: GeForce GTX 1080 computeCapability: 6.1
coreClock: 1.8855GHz coreCount: 20 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 298.32GiB/s
2020-03-24 22:41:58.384041: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_102.dll
2020-03-24 22:41:58.384254: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-03-24 22:41:58.384441: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll
2020-03-24 22:41:58.384612: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll
2020-03-24 22:41:58.384788: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_10.dll
2020-03-24 22:41:58.384962: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_10.dll
2020-03-24 22:41:58.385136: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-03-24 22:41:58.385759: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1686] Adding visible gpu devices: 0
2020-03-24 22:41:58.938950: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1085] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-03-24 22:41:58.939152: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1091] 0
2020-03-24 22:41:58.939263: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1104] 0: N
2020-03-24 22:41:58.939991: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1230] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6278 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080, pci bus id: 0000:0b:00.0, compute capability: 6.1)
2020-03-24 22:41:58.943050: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x1d1c48bade0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-03-24 22:41:58.943286: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): GeForce GTX 1080, Compute Capability 6.1
Epoch 1/1000
2020-03-24 22:42:13.278214: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (0.188999). Check your callbacks.
WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (0.198997). Check your callbacks.
WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (0.199499). Check your callbacks.
WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (0.200001). Check your callbacks.
WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (0.199999). Check your callbacks.
WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (0.200000). Check your callbacks.
WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (0.200000). Check your callbacks.
WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (0.200000). Check your callbacks.
WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (0.200000). Check your callbacks.
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Reactions: 6
- Comments: 44 (4 by maintainers)
This is caused when other operations which runs at the end of each batch consumes longer time than batch themselves. One of the causes might be, we have really smaller batches. So, any operations is slow compared to your original batches.
Increasing batch size actually solved the problem for me.
see @bstriner comment in Link
setting verbose=2 gets rid of the warning
I was having the same warning
I could remove the warning by adding
profile_batch=0
(to disable profiling) to the Tensorboard callback. Now my call back looks like this,I am using Tensorflow 2.3.0.
and below lines removes other warnings.
According to https://www.tensorflow.org/api_docs/python/tf/keras/Model#fit :
So
verbose=2
seems to be not that verbose to trigger such a warning: it prints something on epoch end, which is rarer than the batch end. In contrast, they suggest on the internet to change fromverbose=1
toverbose=2
in order to avoid such warning, because I guessverbose=1
prints something on every batch.This may be due to the fact that you’ve given verbose=2 in your fit model. You can try verbose = 1.
@amahendrakar , thanks for your suggestions about suppressing the logs. However, the point missed in this approach is that excessive output from Tensorflow seems to indicate a bug in it. As you can see in the diagram I attached, it looks like a lot of time is spent on a side activity.
Based on this identical discussion in keras-team/keras, this can be solved by setting
steps_per_execution=N
(whereN
is a high, positive int) in yourmodel.compile()
call. This will cause you to need to specifysteps_per_epoch
inmodel.fit()
, however.Hi Everyone,
After using it for a while, I found this is really not a bug. It is just for each training batch, keras is doing some callbacks. The warning shows up because the callback is taking longer than the training itself. It might be that you are doing some expensive ops. Note that other than the callbacks you defined by yourself, keras has some default callbacks, as shown here: https://github.com/tensorflow/tensorflow/blob/582c8d236cb079023657287c318ff26adb239002/tensorflow/python/keras/callbacks.py#L254-L271
I saw exactly the same warning. In my case, it was the problem of the Tensorboard callback. I used batch size of 32 and probably, Tensorboard’s callback took longer for saving and writing log files than an iteration of 32 examples. I had removed Tensorboard from my callbacks’ list and the Warning has been disappeared.
For me the issue also has nothing to do with the callbacks (it persists after removing all callbacks). For me, the warning appears after I either increase my batch size from 32 to 64 or increase my image width from 32 to 64 pixels.
The reason I mentioned train_step loop is that operations in train_step may not work properly. As you said, this message due to
delta_t_median > 0.95 * self._delta_t_batch and delta_t_median > 0.1
is only printed when you have callbacks. If this message came out only once, it was just a message under that condition. I think the warning was issued because it took some time to initialize, such ason_epoch_begin
oron_train_batch_begin
. There are no other problems.Hi. This problem occurred when I used custom callbacks that took a long time in <= 2.2. I solved this problem by hacking a small part of CallbackList.
The warning message is due to this.
If you don’t need to check the time of your callbacks, you can delete the message and assign a new CallbackList to the original CallbackList as below.
Running tf2.3rc2 on TPU and I am facing the same issue as well.
@srogatch, The
verbose
argument lets you control what output you see on each epoch. For more information please check this StackOverflow comment.Regarding the warning logs, could you please try changing the logging level and check if it works?
Thanks!