tensorflow: tflite runs much slower than tfmobile ...

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu14.04
  • Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: Xiaomi 8
  • TensorFlow installed from (source or binary): binary
  • TensorFlow version (use command below): 1.10
  • Python version:
  • Bazel version (if compiling from source):
  • GCC/Compiler version (if compiling from source):
  • CUDA/cuDNN version: 9.0 / 7.1
  • GPU model and memory:
  • Exact command to reproduce:

Describe the problem

I test performance of tf-mobile, tf-lite, tf-mobile-int8, tf-lite-int8 on android, and I find that the speed of tf-lite is much slower than tf-mobile.

  1. I use freeze_graph to generate A.pb file from checkpoint for testing tf-mobile performance.

  2. I use toco_convert to convert A.pb file to A.tflite file for for testing tf-lite performance.

  3. I use transform_graph to get quantitative AQ.pb file from A.pb file for testing tf-mobile int8 performance.

  4. I train a model with the same architecture by adding the line tf.contrib.quantize.create_training_graph() and get the checkpoint file. Then I replace the line with tf.contrib.quantize.create_eval_graph() to generate the A.pbtxt file, and use checkpoint file and A.pbtxt file to get A8.pb with fake quantization nodes. Finally, I use toco_convert to get the A8.tflite file.

  5. I test the performance with these 4 files on android, each runs several times for inference on the same image, and the result is listed below:

tf-mobile: 357ms per image tf-mobile int8: 356ms per image tf-lite: 844ms per image tf-lite int8; 571ms per image

I wonder why tf-lite is much slower than tf-mobile.

PS: the model architecture only contains: CONV+BN+RELU, RESHAPE, FULLY-CONTECT ops.

The features shape from CONV+BN+RELU is [B,T,C], then I reshape it to [-1,C] and go on to the fc layer, then reshape the out with shape [B*T,K] to [B,T,K], which is the final result I expected.

I wonder is the reshape op the brings the worse performance ?

Thank you very much …

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 23 (11 by maintainers)

Most upvoted comments

We’re in the process of upstreaming the fix to Eigen, stay tuned.

As for TRANSPOSE_CONV ops, TFLite is still much slower than TFMobile unfortunately. Would you like to check my benchmark report #26736 ?

@jdduke Are there any updates on the fix? Thank you!

It’s not quite there, expect an update in the next week or two. Thanks for your patience.