tensorflow: model.predict is much slower on TF 2.1+
System information
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): WIndows 10 and Ubuntu 18.04
- TensorFlow installed from (source or binary): Binary with pip3
- TensorFlow version (use command below): 2.1+ vs. 2.0
- Python version: 3.7
- CUDA/cuDNN version: Used with CPU
- CPU model: Intel i7 5930
Describe the current behavior Starting from tensorflow-cpu 2.1, my program spends multiple fold of time on model.predict() compared to tensorflow 2.0. TF 2.2 get about the same result as 2.1. My original program is fairly complicate. I wrote a simpliest example code below. With TF 2.0, it takes 0.13 seconds to run. With TF 2.2, it takes about 3 seconds.
Describe the expected behavior It should have similar execution time with TF 2.1+
Standalone code to reproduce the issue
from tensorflow.keras import Input, Model
import time
import numpy as np
x = Input(shape=(1, 1))
model = Model(inputs=x, outputs=x)
t = time.time()
i = 0
while i<100:
model.predict(np.zeros((1, 1, 1)))
i += 1
print(time.time() - t)
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 24 (9 by maintainers)
Links to this issue
Commits related to this issue
- Slightly expand on the comments about performance in model.predict e.g. see: https://github.com/tensorflow/tensorflow/issues/40261 For where this keeps coming up PiperOrigin-RevId: 400818061 — committed to keras-team/keras by deleted user 3 years ago
- Slightly expand on the comments about performance in model.predict e.g. see: https://github.com/tensorflow/tensorflow/issues/40261 For where this keeps coming up PiperOrigin-RevId: 400818061 — committed to keras-team/keras by deleted user 3 years ago
- Slightly expand on the comments about performance in model.predict e.g. see: https://github.com/tensorflow/tensorflow/issues/40261 For where this keeps coming up PiperOrigin-RevId: 400818061 — committed to keras-team/keras by deleted user 3 years ago
- Slightly expand on the comments about performance in model.predict e.g. see: https://github.com/tensorflow/tensorflow/issues/40261 For where this keeps coming up PiperOrigin-RevId: 400818061 — committed to keras-team/keras by deleted user 3 years ago
- Slightly expand on the comments about performance in model.predict e.g. see: https://github.com/tensorflow/tensorflow/issues/40261 For where this keeps coming up PiperOrigin-RevId: 400835355 — committed to keras-team/keras by deleted user 3 years ago
Hi @lihanchen, judging by your filed bug and your example code, I’m assuming you’re running
model.predictinside of a loop?Model.predict is a top-level API designed for batch-predicting outside of any loops, with the fully featureset of the Keras apis. This means it manages things like converting your input to a tf.dataset and batching it, putting your computation into a tf.function, handling keras callbacks, etc.
If you’re looking for a quick low-overhead model call to put inside of a loop / inside your own tf.function, we suggest directly calling the model on your data instead (w/
training=Trueto put the model in inference mode)For example, running the following in colab with the tf.nightlies:
prints
3.521230459213257prints
0.01329183578491211As you can see there’s a 300x difference in the constant overheads.
All that being said:
If you can point us to an example (or input data) where a single predict call that’s actually processing multiple batches of data (& possibly using various keras
predictfunctionality) is much slower than in 2.0 that would be super helpful.predictseems like a shockingly high overhead forpredict, I’ll look into this. It’s unlikely to make it into 2.3 unless we cherrypick, but I’ll see what can be done.What exactly does your workload look like? Is it a batch predict situation? Are you serving a lot of very small predictions interactively?
If you have a batch-predicting situation where you have a lot of data you’re trying to predict on, I suggest loading it with
tf.datadatasets and passing it directly to.predictinstead of calling.predictin a loop. This is the settingmodel.predictis optimized for performance-wisemodel.predictautomatically wraps your model in atf.function, which will generally improve performance for all but the smallest models. If you want the advantage of atf.functionwithout the other overheads ofpredictfor an environment where you’re interactively predicting, then I would suggest defining a tf.function that calls your model, e.g.@ectg Did you try tf.function-ing your
model.callmethod before calling your model? As such:Also, what are the use cases where you all are finding that building a
tf.datasetat the start and batch predicting is impractical? Knowing this would help our prioritization.@owni1337 sorry if this is too late, but you can grab the numpy value of an eager tensor
tensor.numpy()@cccat6 As mentioned above,
model.predictModel.predict is a top-level API designed for batch-predicting outside of any other loops. Because it is designed for batch prediction and comes with other functionality, it comes with an inherently higher overhead. Much of this overhead is python or cpu-side overhead and is unrelated to the actual model computation.You can use tensor.numpy() if you need access to the tensor as a numpy array.
I’m going to go ahead and close this issue for now, as we have added to the model.predict documentation & I’ve opened the above PR to update the docstring further.
@tomerk Hi tomerk,
I know it might be an old issue, but it still exists today. I am using
tensorflow-gpu 2.6.0withkeras 2.6.0. These two lines of code have the same output but thepredictfunction spends much much more time thanmodel.The output type of
predictis a Python Array, whilemodelis a TensorFlow Array. I don’t think there is any actual difference. Please forgive me I did not view the code in these parts but just throw the question. Just to inform that it should be an issue since the time difference is huge but the results are the same.There is also a strange part but might be my problem. I feel like the
predictfunction did not use my GPU since the usage of GPU is 0% when it calling. Training does use GPU.Thanks a lot.