tensorflow: model.predict is much slower on TF 2.1+

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): WIndows 10 and Ubuntu 18.04
  • TensorFlow installed from (source or binary): Binary with pip3
  • TensorFlow version (use command below): 2.1+ vs. 2.0
  • Python version: 3.7
  • CUDA/cuDNN version: Used with CPU
  • CPU model: Intel i7 5930

Describe the current behavior Starting from tensorflow-cpu 2.1, my program spends multiple fold of time on model.predict() compared to tensorflow 2.0. TF 2.2 get about the same result as 2.1. My original program is fairly complicate. I wrote a simpliest example code below. With TF 2.0, it takes 0.13 seconds to run. With TF 2.2, it takes about 3 seconds.

Describe the expected behavior It should have similar execution time with TF 2.1+

Standalone code to reproduce the issue

from tensorflow.keras import Input, Model
import time
import numpy as np

x = Input(shape=(1, 1))
model = Model(inputs=x, outputs=x)

t = time.time()
i = 0
while i<100:
    model.predict(np.zeros((1, 1, 1)))
    i += 1
print(time.time() - t)

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 24 (9 by maintainers)

Commits related to this issue

Most upvoted comments

Hi @lihanchen, judging by your filed bug and your example code, I’m assuming you’re running model.predict inside of a loop?

Model.predict is a top-level API designed for batch-predicting outside of any loops, with the fully featureset of the Keras apis. This means it manages things like converting your input to a tf.dataset and batching it, putting your computation into a tf.function, handling keras callbacks, etc.

If you’re looking for a quick low-overhead model call to put inside of a loop / inside your own tf.function, we suggest directly calling the model on your data instead (w/ training=True to put the model in inference mode)

For example, running the following in colab with the tf.nightlies:

x = Input(shape=(1, 1))
model = Model(inputs=x, outputs=x)

t = time.time()
i = 0
while i<100:
    model.predict(np.zeros((1, 1, 1)))
    i += 1
print(time.time() - t)

prints 3.521230459213257

x = Input(shape=(1, 1))
model = Model(inputs=x, outputs=x)

t = time.time()
i = 0
while i<100:
    model(np.zeros((1, 1, 1)), training=False)
    i += 1
print(time.time() - t)

prints 0.01329183578491211

As you can see there’s a 300x difference in the constant overheads.


All that being said:

  1. If I’m mistaken about how you’re actually using predict (and you’re using it outside of any loops), please let us know!

If you can point us to an example (or input data) where a single predict call that’s actually processing multiple batches of data (& possibly using various keras predict functionality) is much slower than in 2.0 that would be super helpful.

  1. 35 milliseconds / predict seems like a shockingly high overhead for predict, I’ll look into this. It’s unlikely to make it into 2.3 unless we cherrypick, but I’ll see what can be done.

What exactly does your workload look like? Is it a batch predict situation? Are you serving a lot of very small predictions interactively?

If you have a batch-predicting situation where you have a lot of data you’re trying to predict on, I suggest loading it with tf.data datasets and passing it directly to .predict instead of calling .predict in a loop. This is the setting model.predict is optimized for performance-wise

model.predict automatically wraps your model in a tf.function, which will generally improve performance for all but the smallest models. If you want the advantage of a tf.function without the other overheads of predict for an environment where you’re interactively predicting, then I would suggest defining a tf.function that calls your model, e.g.

@tf.function
def serve(x):
  return model(x, training=False)

@ectg Did you try tf.function-ing your model.call method before calling your model? As such:

model.call = tf.function(model.call, experimental_relax_shapes)
model(..., training=False)

Also, what are the use cases where you all are finding that building a tf.dataset at the start and batch predicting is impractical? Knowing this would help our prioritization.

@owni1337 sorry if this is too late, but you can grab the numpy value of an eager tensor tensor.numpy()

@cccat6 As mentioned above, model.predict Model.predict is a top-level API designed for batch-predicting outside of any other loops. Because it is designed for batch prediction and comes with other functionality, it comes with an inherently higher overhead. Much of this overhead is python or cpu-side overhead and is unrelated to the actual model computation.

You can use tensor.numpy() if you need access to the tensor as a numpy array.

I’m going to go ahead and close this issue for now, as we have added to the model.predict documentation & I’ve opened the above PR to update the docstring further.

@tomerk Hi tomerk,

I know it might be an old issue, but it still exists today. I am using tensorflow-gpu 2.6.0 with keras 2.6.0. These two lines of code have the same output but the predict function spends much much more time than model.

self.model.predict(state) # slow
self.model(state, training=False) # fast

The output type of predict is a Python Array, while model is a TensorFlow Array. I don’t think there is any actual difference. Please forgive me I did not view the code in these parts but just throw the question. Just to inform that it should be an issue since the time difference is huge but the results are the same.

There is also a strange part but might be my problem. I feel like the predict function did not use my GPU since the usage of GPU is 0% when it calling. Training does use GPU.

Thanks a lot.