tensorflow: tflite runs much slower than tfmobile ...

System information

Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu14.04
Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: Xiaomi 8
TensorFlow installed from (source or binary): binary
TensorFlow version (use command below): 1.10
Python version:
Bazel version (if compiling from source):
GCC/Compiler version (if compiling from source):
CUDA/cuDNN version: 9.0 / 7.1
GPU model and memory:
Exact command to reproduce:

Describe the problem

I test performance of tf-mobile, tf-lite, tf-mobile-int8, tf-lite-int8 on android, and I find that the speed of tf-lite is much slower than tf-mobile.

I use freeze_graph to generate A.pb file from checkpoint for testing tf-mobile performance.
I use toco_convert to convert A.pb file to A.tflite file for for testing tf-lite performance.
I use transform_graph to get quantitative AQ.pb file from A.pb file for testing tf-mobile int8 performance.
I train a model with the same architecture by adding the line tf.contrib.quantize.create_training_graph() and get the checkpoint file. Then I replace the line with tf.contrib.quantize.create_eval_graph() to generate the A.pbtxt file, and use checkpoint file and A.pbtxt file to get A8.pb with fake quantization nodes. Finally, I use toco_convert to get the A8.tflite file.
I test the performance with these 4 files on android, each runs several times for inference on the same image, and the result is listed below:

tf-mobile: 357ms per image tf-mobile int8: 356ms per image tf-lite: 844ms per image tf-lite int8; 571ms per image

I wonder why tf-lite is much slower than tf-mobile.

PS: the model architecture only contains: CONV+BN+RELU, RESHAPE, FULLY-CONTECT ops.

The features shape from CONV+BN+RELU is [B,T,C], then I reshape it to [-1,C] and go on to the fc layer, then reshape the out with shape [B*T,K] to [B,T,K], which is the final result I expected.

I wonder is the reshape op the brings the worse performance ?

Thank you very much …

About this issue

Original URL
State: closed
Created 6 years ago
Comments: 23 (11 by maintainers)

Most upvoted comments

We’re in the process of upstreaming the fix to Eigen, stay tuned.

jdduke on Feb 7, 2019

As for TRANSPOSE_CONV ops, TFLite is still much slower than TFMobile unfortunately. Would you like to check my benchmark report #26736 ?

stakemura on Mar 15, 2019

@jdduke Are there any updates on the fix? Thank you!

lenaevans on Feb 26, 2019

It’s not quite there, expect an update in the next week or two. Thanks for your patience.

jdduke on Jan 23, 2019