tensorflow: Understanding warning "5 out of the last 5 calls to triggered tf.function retracing"

System information

Have I written custom code: Yes, below.
Linux Ubuntu 16.04
TensorFlow 2.1.0-dev20191103, binary install.
Python 3.6
CUDA 10.0 / cnDNN 7.6.4
4 * NVIDIA TITAN X (12GB)

I defined a very simple training script with a custom loss function and .fit() as below. The loss_fn is very simple and I think every time it takes tensors of the same shape and type. But I’m getting the following warning message. Interesting is that I’m getting the message only when training with multiple GPUs. Is it a bug? Is it harmful? Is it really affecting the computational cost?

Warning message:

WARNING:tensorflow:5 out of the last 5 calls to <function loss_fn at 0x7f0070ef0e18> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings is likely due to passing python objects instead of tensors. Also, tf.function has experimental_re
lax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. Please refer to https://www.tensorflow.org/tutorials/customization/performance#python_or_tensor_args and https://www.tensorflow.org/api_docs/python/tf/function for more details.

Code:

from __future__ import absolute_import, division, print_function, unicode_literals
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.layers import (Conv2D, Conv3D, Dense)


@tf.function
def loss_fn(y_pred, y_true):
    return tf.reduce_mean(tf.math.square(y_pred - y_true))

if __name__ == "__main__":

    BATCH_SIZE_PER_SYNC = 4
    strategy = tf.distribute.MirroredStrategy()
    num_gpus = strategy.num_replicas_in_sync
    global_batch_size = BATCH_SIZE_PER_SYNC * num_gpus
    print('num GPUs: {}, global batch size: {}'.format(num_gpus, global_batch_size))


    # fake data ------------------------------------------------------
    fakea = np.random.rand(global_batch_size, 10, 200, 200, 128).astype(np.float32)
    targets = np.random.rand(global_batch_size, 200, 200, 14).astype(np.float32)

    fakea = tf.constant(fakea)
    targets = tf.constant(targets)

    # tf.Dataset ------------------------------------------------------
    def gen():
        while True:
            yield (fakea, targets)

    dataset = tf.data.Dataset.from_generator(gen,
        (tf.float32, tf.float32),
        (tf.TensorShape(fakea.shape), tf.TensorShape(targets.shape)))

    dataset = dataset.prefetch(buffer_size=tf.data.experimental.AUTOTUNE)

    # training ------------------------------------------------------
    callbacks = [tf.keras.callbacks.TensorBoard(log_dir='./logs')]
    training = True
    with strategy.scope():
        va = keras.Input(shape=(10, 200, 200, 128), dtype=tf.float32, name='va')
        x = Conv3D(64, kernel_size=3, strides=1, padding='same')(va)
        x = Conv3D(64, kernel_size=3, strides=1, padding='same')(x)
        x = Conv3D(64, kernel_size=3, strides=1, padding='same')(x)
        x = tf.reduce_max(x, axis=1, name='maxpool')                         
        b = Conv2D(14, kernel_size=3, padding='same')(x)
        model = keras.Model(inputs=va, outputs=b, name='net')
        optimizer = keras.optimizers.RMSprop()

        model.compile(optimizer=optimizer, loss=loss_fn)
        model.fit(x=dataset, epochs=10, steps_per_epoch=100, callbacks=callbacks)

About this issue

Original URL
State: closed
Created 5 years ago
Reactions: 11
Comments: 44 (9 by maintainers)

Most upvoted comments

When calling a simple RNN with input sequences of various lengths, it seems that the model gets traced once for each sequence length, as you can see in this Colab. Here’s the code:

import tensorflow as tf
from tensorflow import keras

model = keras.models.Sequential([
    keras.layers.SimpleRNN(10, return_sequences=True, input_shape=[None, 4]),
    keras.layers.Dense(1)
])
model.compile(loss="mse", optimizer="nadam")

X_train = tf.random.uniform(shape=[100, 50, 4])
y_train = tf.random.uniform(shape=[100, 1])
model.fit(X_train, y_train)

for length in range(1, 20):
    X_new = tf.random.uniform([1, length, 4])
    model.predict(X_new)

I get the dreaded warning:

WARNING:tensorflow:5 out of the last 5 calls to <function _make_execution_function.<locals>.distributed_function at 0x7fe3de7c0268> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings is likely due to passing python objects instead of tensors. Also, tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. Please refer to https://www.tensorflow.org/tutorials/customization/performance#python_or_tensor_args and https://www.tensorflow.org/api_docs/python/tf/function for more details.
WARNING:tensorflow:6 out of the last 6 calls to <function _make_execution_function.<locals>.distributed_function at 0x7fe3de7c0268> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings is likely due to passing python objects instead of tensors. Also, tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. Please refer to https://www.tensorflow.org/tutorials/customization/performance#python_or_tensor_args and https://www.tensorflow.org/api_docs/python/tf/function for more details.
...

Perhaps there’s a way to force the model to use a single graph instead of tracing a new one for each input shape?

+20

ageron on Mar 25, 2020

I used model(x) instead of model.predict(x) And it worked for me

+17

abrahamadamu on Nov 30, 2022

me too… how to solve this problem ?

+12

taki0112 on May 22, 2020

tf.functions can only handle a pre defined input shape, if the shape changes, or if diferent python objects get passed, tensorflow automagically rebuilds the function.

so i would suggest you check and print all the input parameter that get passed to Model.make_predict_function.<locals>.predict_function if a input is no tensor or if the shape changes you have found the problem

bela127 on Apr 24, 2020

I was getting this same warning message. Using a LSTM model with variable sequence lengths. Upgrading from TF 2.2 to TF 2.3 seems to have fixed it.

sterlingrpi on Aug 10, 2020

I get the same error if trying to predict in a for-loop (due to variable length inputs).

I have to call model.predict(np.expand_dims(xi, axis = 0)) on each sample individually, or tensorflow will attempt to concatenate predictions and fail.

This is probably calling something several times, when it shouldn’t.

komodovaran on Nov 22, 2019

I am also getting this error on a classical functional model without tf.function and variable-length inputs in a for loop using predict. This happens only in tf 2.2.0rc2, and as soon as I switch back to tf 2.1 the problem disappears.

zaccharieramzi on Apr 2, 2020

UPDATE: is it possible its only a problem with the: keras.layers.SimpleRNN(10, return_sequences=True, input_shape=[None, 4]) Im not sure but return_sequences=TRUE when im correct will return the hole RNN Sequence, not just the last element, so for different sized inputs you will get different sized outputs. Dens needs a fixed sized input. So it will be retraced every time.

when you try to use the same dense layer for every single output you cant do this with Sequential, because it would be a parallel operation. If this is the case i my have a solution for you…

OLD: yes i see @ageron, had the same problem with Keras functional api, it seams the functional keras api has problems with variable length inputs, different than the batchdim

a workaround is using the layer class api and subclass Keras Layer. here you can update the call method with tf.function annotation with "relax_shape:

@tf.function( experimental_relax_shapes=True)

or explicit:

self.call = tf.function(self.call,input_signature=[(tf.TensorSpec([None, None, None, 3], dtype=tf.float32), (tf.TensorSpec([None], dtype=tf.float32), tf.TensorSpec([None, 15, 3], dtype=tf.float32)), tf.TensorSpec([], dtype=tf.float32), tf.TensorSpec([], dtype=tf.float32))]) in __init__ or build

But i agree the functional api should although support this.

bela127 on Mar 26, 2020

@zaccharieramzi - yes please file a separate ticket as this is a Keras issue without distribution strategy. The error message might be the same but the root causes are different.

guptapriya on Apr 14, 2020

@zubaidah93 , Python 2 is officially dead, it is not supported anymore, there are no updates anymore, including security updates, so it is even unsafe to use it. You should really upgrade to Python 3. And TensorFlow 1.1.0 is several years old. If you want to use TensorFlow 1, you should install TensorFlow 1.15 instead. But TensorFlow 2 is much better.

ageron on Mar 11, 2021

Am using only 1 gpu and still am facing this retracing issue. After trying out verious fixes to this problem by searching over the internet, nothing worked. Including tf-nightly-gpu since its first of all not detecting my gpu. So I have just downgraded from tf 2.3.0 to tf 2.2.1 and it has fixed this issue for now.

AdityaNikhil on Oct 14, 2020

I am also facing the same issue with Tensorflow 2.2.0-rc2 while training with variable lengths in a loop.

himarora on Apr 5, 2020