tensorflow: Tensorflow Lite, python API does not work

System information

TensorFlow version: 1.9.0
Python version: 3.5

Describe the problem

I am try run TFlite model file with Python API (like in example: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/lite/toco/g3doc/python_api.md), but I get an error: ImportError: /home/pi/.local/lib/python3.5/site-packages/tensorflow/contrib/lite/python/interpreter_wrapper/_tensorflow_wrap_interpreter_wrapper.so: undefined symbol: _ZN6tflite12tensor_utils39NeonMatrixBatchVectorMultiplyAccumulateEPKaiiS2_PKfiPfi

Source code / logs

My code:

import tensorflow as tf

if __name__ == "__main__":
 
   # Load TFLite model and allocate tensors.
   interpreter = tf.contrib.lite.Interpreter(model_path="./mobilenet_v1_0.25_128_quant.tflite")
 
   interpreter.allocate_tensors()
 
   #Get input and output tensors.
   input_details = interpreter.get_input_details()
   output_details = interpreter.get_output_details()

   print(input_details)
   print(output_details)

Log output:

Traceback (most recent call last):
  File "tflite_test.py", line 12, in <module>
    interpreter = tf.contrib.lite.Interpreter(model_path="/home/pi/test/mobilenet_v1_0.25_128_quant/mobilenet_v1_0.25_128_quant.tflite")
  File "/home/pi/.local/lib/python3.5/site-packages/tensorflow/contrib/lite/python/interpreter.py", line 50, in __init__
    _interpreter_wrapper.InterpreterWrapper_CreateWrapperCPPFromFile(
  File "/home/pi/.local/lib/python3.5/site-packages/tensorflow/python/util/lazy_loader.py", line 53, in __getattr__
    module = self._load()
  File "/home/pi/.local/lib/python3.5/site-packages/tensorflow/python/util/lazy_loader.py", line 42, in _load
    module = importlib.import_module(self.__name__)
  File "/usr/lib/python3.5/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 986, in _gcd_import
  File "<frozen importlib._bootstrap>", line 969, in _find_and_load
  File "<frozen importlib._bootstrap>", line 958, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 673, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 673, in exec_module
  File "<frozen importlib._bootstrap>", line 222, in _call_with_frames_removed
  File "/home/pi/.local/lib/python3.5/site-packages/tensorflow/contrib/lite/python/interpreter_wrapper/tensorflow_wrap_interpreter_wrapper.py", line 28, in <module>
    _tensorflow_wrap_interpreter_wrapper = swig_import_helper()
  File "/home/pi/.local/lib/python3.5/site-packages/tensorflow/contrib/lite/python/interpreter_wrapper/tensorflow_wrap_interpreter_wrapper.py", line 24, in swig_import_helper
    _mod = imp.load_module('_tensorflow_wrap_interpreter_wrapper', fp, pathname, description)
  File "/usr/lib/python3.5/imp.py", line 242, in load_module
    return load_dynamic(name, filename, file)
  File "/usr/lib/python3.5/imp.py", line 342, in load_dynamic
    return _load(spec)
  File "<frozen importlib._bootstrap>", line 693, in _load
  File "<frozen importlib._bootstrap>", line 666, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 577, in module_from_spec
  File "<frozen importlib._bootstrap_external>", line 914, in create_module
  File "<frozen importlib._bootstrap>", line 222, in _call_with_frames_removed
ImportError: /home/pi/.local/lib/python3.5/site-packages/tensorflow/contrib/lite/python/interpreter_wrapper/_tensorflow_wrap_interpreter_wrapper.so: undefined symbol: _ZN6tflite12tensor_utils39NeonMatrixBatchVectorMultiplyAccumulateEPKaiiS2_PKfiPfi

About this issue

Original URL
State: closed
Created 6 years ago
Comments: 54 (25 by maintainers)

Most upvoted comments

SOLUTION FOR THIS ERROR!

Source Code: interpreter = tf.contrib.lite.Interpreter(model_path=“optimized_graph.tflite”) interpreter.allocate_tensors()

ImportError: /home/pi/.local/lib/python3.5/site-packages/tensorflow/contrib/lite/python/interpreter_wrapper/_tensorflow_wrap_interpreter_wrapper.so: undefined symbol: _ZN6tflite12tensor_utils39NeonMatrixBatchVectorMultiplyAccumulateEPKaiiS2_PKfiPfi

just install tensorflow 1.11.0 following the next steps:

$ sudo apt-get install python-pip python3-pip $ sudo pip3 uninstall tensorflow $ git clone https://github.com/PINTO0309/Tensorflow-bin.git $ cd Tensorflow-bin $ sudo pip3 install tensorflow-1.11.0-cp35-cp35m-linux_armv7l.whl

if it doesn´t work, try to re-format the sd card and do it again

+21

EmilioMezaE on Oct 26, 2018

I tried implementing MultiThread with Tensorflow Lite v1.11.0. It gained 2.5 times the performance.

https://github.com/PINTO0309/Tensorflow-bin/blob/master/tensorflow-1.11.0-cp35-cp35m-linux_armv7l_jemalloc_multithread.whl

$ sudo apt-get install -y libhdf5-dev libc-ares-dev libeigen3-dev
$ sudo pip3 install keras_applications==1.0.7 --no-deps
$ sudo pip3 install keras_preprocessing==1.0.9 --no-deps
$ sudo pip3 install h5py==2.9.0
$ sudo apt-get install -y openmpi-bin libopenmpi-dev
$ sudo pip3 uninstall tensorflow
$ wget -O tensorflow-1.11.0-cp35-cp35m-linux_armv7l.whl https://github.com/PINTO0309/Tensorflow-bin/raw/master/tensorflow-1.11.0-cp35-cp35m-linux_armv7l_jemalloc_multithread.whl
$ sudo pip3 install tensorflow-1.11.0-cp35-cp35m-linux_armv7l.whl

【Required】 Restart the terminal.

Customize “tensorflow/contrib/lite/examples/python/label_image.py”.

import argparse
import numpy as np
import time

from PIL import Image

from tensorflow.contrib.lite.python import interpreter as interpreter_wrapper
def load_labels(filename):
  my_labels = []
  input_file = open(filename, 'r')
  for l in input_file:
    my_labels.append(l.strip())
  return my_labels
if __name__ == "__main__":
  floating_model = False
  parser = argparse.ArgumentParser()
  parser.add_argument("-i", "--image", default="/tmp/grace_hopper.bmp", \
    help="image to be classified")
  parser.add_argument("-m", "--model_file", \
    default="/tmp/mobilenet_v1_1.0_224_quant.tflite", \
    help=".tflite model to be executed")
  parser.add_argument("-l", "--label_file", default="/tmp/labels.txt", \
    help="name of file containing labels")
  parser.add_argument("--input_mean", default=127.5, help="input_mean")
  parser.add_argument("--input_std", default=127.5, \
    help="input standard deviation")
  parser.add_argument("--num_threads", default=1, help="number of threads")
  args = parser.parse_args()

  interpreter = interpreter_wrapper.Interpreter(model_path=args.model_file)
  interpreter.allocate_tensors()
  input_details = interpreter.get_input_details()
  output_details = interpreter.get_output_details()
  # check the type of the input tensor
  if input_details[0]['dtype'] == np.float32:
    floating_model = True
  # NxHxWxC, H:1, W:2
  height = input_details[0]['shape'][1]
  width = input_details[0]['shape'][2]
  img = Image.open(args.image)
  img = img.resize((width, height))
  # add N dim
  input_data = np.expand_dims(img, axis=0)
  if floating_model:
    input_data = (np.float32(input_data) - args.input_mean) / args.input_std

  interpreter.set_num_threads(int(args.num_threads))
  interpreter.set_tensor(input_details[0]['index'], input_data)

  start_time = time.time()
  interpreter.invoke()
  stop_time = time.time()

  output_data = interpreter.get_tensor(output_details[0]['index'])
  results = np.squeeze(output_data)
  top_k = results.argsort()[-5:][::-1]
  labels = load_labels(args.label_file)
  for i in top_k:
    if floating_model:
      print('{0:08.6f}'.format(float(results[i]))+":", labels[i])
    else:
      print('{0:08.6f}'.format(float(results[i]/255.0))+":", labels[i])

  print("time: ", stop_time - start_time)

Environment Preparation for MobileNet v1.

$ cd ~;mkdir test
$ curl https://raw.githubusercontent.com/tensorflow/tensorflow/master/tensorflow/lite/examples/label_image/testdata/grace_hopper.bmp > ~/test/grace_hopper.bmp
$ curl https://storage.googleapis.com/download.tensorflow.org/models/mobilenet_v1_1.0_224_frozen.tgz | tar xzv -C ~/test mobilenet_v1_1.0_224/labels.txt
$ mv ~/test/mobilenet_v1_1.0_224/labels.txt ~/test/
$ curl http://download.tensorflow.org/models/mobilenet_v1_2018_02_22/mobilenet_v1_1.0_224_quant.tgz | tar xzv -C ~/test
$ cp tensorflow/tensorflow/contrib/lite/examples/python/label_image.py ~/test

Result of x1 Thread.

$ cd ~/test
$ python3 label_image.py \
--num_threads 1 \
--image grace_hopper.bmp \
--model_file mobilenet_v1_1.0_224_quant.tflite \
--label_file labels.txt

0.415686: 653:military uniform
0.352941: 907:Windsor tie
0.058824: 668:mortarboard
0.035294: 458:bow tie, bow-tie, bowtie
0.035294: 835:suit, suit of clothes
time:  0.4152982234954834

Result of x4 Thread.

$ cd ~/test
$ python3 label_image.py \
--num_threads 4 \
--image grace_hopper.bmp \
--model_file mobilenet_v1_1.0_224_quant.tflite \
--label_file labels.txt

0.415686: 653:military uniform
0.352941: 907:Windsor tie
0.058824: 668:mortarboard
0.035294: 458:bow tie, bow-tie, bowtie
0.035294: 835:suit, suit of clothes
time:  0.1647195816040039

PINTO0309 on Feb 23, 2019

I am referring to the suggestion of @freedomtan completely. Thank you, freedomtan. I found that activating MPI is meaningful for performance improvement, so now I started recompiling. It will take about 3 days.

PINTO0309 on Oct 30, 2018

for multi-threading stuff, I sent a PR https://github.com/tensorflow/tensorflow/pull/25748

freedomtan on Feb 16, 2019

@gasparka

I tried installing the rebuilt binary with “jemalloc” and “MPI” enabled. Unfortunately, I did not get faster as I expected. “MPI” seems to be a mechanism to speed up by distributed processing at learning.

【My ENet】 Pure Tensorflow v1.11.0 10.2 sec —> 9.5 sec 【My UNet】 Tensorflow Lite v1.11.0 11.5 sec —> 12.1 sec

https://github.com/PINTO0309/Tensorflow-bin.git tensorflow-1.11.0-cp35-cp35m-linux_armv7l_jemalloc_mpi.whl

Next I will try to validate “XLA JIT” and I will try to verify whether to speed up. I hope it will work…

PINTO0309 on Nov 1, 2018

@rky0930 , @EmilioMezaE see my previous comments for the reason and build instructions

freedomtan on Oct 30, 2018

@PINTO0309 and @masterchop as far as I can remember only the convolution kernel is multithreaded, so you hit Amdahl’s law

freedomtan on Feb 26, 2019

@masterchop

Did you ran this on the RaspberryPI 3B?

Yes. The above performance measurement result is based on RaspberryPi3.

What about the PI resources was it taking over 90% or still at 25%-30%?

25%-30%

It seems that you are misunderstanding something. “freedomtan” and my implementation is “MultiThread”. It is not “MultiProcess”. Performance will never improve more than 4 times. 4 Core is never used in full. http://www.dabeaz.com/python/UnderstandingGIL.pdf https://qiita.com/pumbaacave/items/942f86269b2c56313c15

If you need implementation with 4 cores, implement it yourself. I am sorry, I do not have the technology of implementation by “C++ and MultiProcess”.

PINTO0309 on Feb 25, 2019

Thank you for always doing great work, @freedomtan. I succeeded in building Tensorflow Lite, incorporating your suggestion. https://github.com/tensorflow/tensorflow/issues/25120#issuecomment-464401990

PINTO0309 on Feb 17, 2019

We got ours to work by updating interpreter.py to include contrib in the path as follows: _interpreter_wrapper = LazyLoader( “_interpreter_wrapper”, globals(), “tensorflow.contrib.lite.python.interpreter_wrapper.” “tensorflow_wrap_interpreter_wrapper”)

pylint: enable=g-inconsistent-quotes

jdinkel88 on Feb 16, 2019

@gasparka I tried rebuilding with MultiThread enabled. However, it seems that Python’s Wrapper does not refer to “Thread Count”, and the processing speed has not changed. If it is a C++ program, 4 threads will be used. Since I do not have the skills to write C++ programs, can you try? —> gasparka https://github.com/PINTO0309/Tensorflow-bin.git tensorflow-1.11.0-cp35-cp35m-linux_armv7l_jemalloc_mpi_multithread.whl

Results of Python program. 【My ENet】 Pure Tensorflow v1.11.0 9.5 sec —> 9.5 sec 【My UNet】 Tensorflow Lite v1.11.0 12.1 sec —> 12.5 sec

Next I will try to validate “XLA JIT” and I will try to verify whether to speed up.

PINTO0309 on Nov 2, 2018

@PINTO0309 Have you experimented with the thread count? I see that Lite is stuck on one thread, there is a C++ API for this but nothing in Python.

Could try to hardcode the thread count to 4: https://github.com/tensorflow/tensorflow/blob/1084594657a5d139102ac794f84d1427a710e39a/tensorflow/contrib/lite/interpreter.cc#L127

gasparka on Nov 1, 2018

You’re welcome @rky0930 ! I’ sorry but i don’t know the reason of this problem, i just saw this page https://github.com/PINTO0309/Tensorflow-bin and followed the process

EmilioMezaE on Oct 30, 2018

@gasparka My solution is to disable “jemalloc”. https://github.com/PINTO0309/Tensorflow-bin.git Although I have not tried it yet, enabling “jemalloc” may improve performance.

PINTO0309 on Oct 29, 2018

@sahilparekh for 1.9.x to 1.11.x, what I posted in Aug,

bazel build --config opt --local_resources 1024.0,0.5,0.5 \
--copt=-mfpu=neon-vfpv4 \
--copt=-ftree-vectorize \
--copt=-funsafe-math-optimizations \
--copt=-ftree-loop-vectorize \
--copt=-fomit-frame-pointer \
--copt=-DRASPBERRY_PI \
--host_copt=-DRASPBERRY_PI \
//tensorflow/tools/pip_package:build_pip_package

should work. For master branch, some modification for building AWS SDK may be needed. AWS SDK problem may need something like https://github.com/tensorflow/tensorflow/pull/22856

freedomtan on Oct 10, 2018