tensorflow: Tensorflow Lite, python API does not work
System information
- TensorFlow version: 1.9.0
- Python version: 3.5
Describe the problem
I am try run TFlite model file with Python API (like in example: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/lite/toco/g3doc/python_api.md), but I get an error: ImportError: /home/pi/.local/lib/python3.5/site-packages/tensorflow/contrib/lite/python/interpreter_wrapper/_tensorflow_wrap_interpreter_wrapper.so: undefined symbol: _ZN6tflite12tensor_utils39NeonMatrixBatchVectorMultiplyAccumulateEPKaiiS2_PKfiPfi
Source code / logs
My code:
import tensorflow as tf
if __name__ == "__main__":
# Load TFLite model and allocate tensors.
interpreter = tf.contrib.lite.Interpreter(model_path="./mobilenet_v1_0.25_128_quant.tflite")
interpreter.allocate_tensors()
#Get input and output tensors.
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
print(input_details)
print(output_details)
Log output:
Traceback (most recent call last):
File "tflite_test.py", line 12, in <module>
interpreter = tf.contrib.lite.Interpreter(model_path="/home/pi/test/mobilenet_v1_0.25_128_quant/mobilenet_v1_0.25_128_quant.tflite")
File "/home/pi/.local/lib/python3.5/site-packages/tensorflow/contrib/lite/python/interpreter.py", line 50, in __init__
_interpreter_wrapper.InterpreterWrapper_CreateWrapperCPPFromFile(
File "/home/pi/.local/lib/python3.5/site-packages/tensorflow/python/util/lazy_loader.py", line 53, in __getattr__
module = self._load()
File "/home/pi/.local/lib/python3.5/site-packages/tensorflow/python/util/lazy_loader.py", line 42, in _load
module = importlib.import_module(self.__name__)
File "/usr/lib/python3.5/importlib/__init__.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 986, in _gcd_import
File "<frozen importlib._bootstrap>", line 969, in _find_and_load
File "<frozen importlib._bootstrap>", line 958, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 673, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 673, in exec_module
File "<frozen importlib._bootstrap>", line 222, in _call_with_frames_removed
File "/home/pi/.local/lib/python3.5/site-packages/tensorflow/contrib/lite/python/interpreter_wrapper/tensorflow_wrap_interpreter_wrapper.py", line 28, in <module>
_tensorflow_wrap_interpreter_wrapper = swig_import_helper()
File "/home/pi/.local/lib/python3.5/site-packages/tensorflow/contrib/lite/python/interpreter_wrapper/tensorflow_wrap_interpreter_wrapper.py", line 24, in swig_import_helper
_mod = imp.load_module('_tensorflow_wrap_interpreter_wrapper', fp, pathname, description)
File "/usr/lib/python3.5/imp.py", line 242, in load_module
return load_dynamic(name, filename, file)
File "/usr/lib/python3.5/imp.py", line 342, in load_dynamic
return _load(spec)
File "<frozen importlib._bootstrap>", line 693, in _load
File "<frozen importlib._bootstrap>", line 666, in _load_unlocked
File "<frozen importlib._bootstrap>", line 577, in module_from_spec
File "<frozen importlib._bootstrap_external>", line 914, in create_module
File "<frozen importlib._bootstrap>", line 222, in _call_with_frames_removed
ImportError: /home/pi/.local/lib/python3.5/site-packages/tensorflow/contrib/lite/python/interpreter_wrapper/_tensorflow_wrap_interpreter_wrapper.so: undefined symbol: _ZN6tflite12tensor_utils39NeonMatrixBatchVectorMultiplyAccumulateEPKaiiS2_PKfiPfi
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 54 (25 by maintainers)
SOLUTION FOR THIS ERROR!
Source Code: interpreter = tf.contrib.lite.Interpreter(model_path=“optimized_graph.tflite”) interpreter.allocate_tensors()
ImportError: /home/pi/.local/lib/python3.5/site-packages/tensorflow/contrib/lite/python/interpreter_wrapper/_tensorflow_wrap_interpreter_wrapper.so: undefined symbol: _ZN6tflite12tensor_utils39NeonMatrixBatchVectorMultiplyAccumulateEPKaiiS2_PKfiPfi
just install tensorflow 1.11.0 following the next steps:
$ sudo apt-get install python-pip python3-pip $ sudo pip3 uninstall tensorflow $ git clone https://github.com/PINTO0309/Tensorflow-bin.git $ cd Tensorflow-bin $ sudo pip3 install tensorflow-1.11.0-cp35-cp35m-linux_armv7l.whl
if it doesn´t work, try to re-format the sd card and do it again
I tried implementing MultiThread with Tensorflow Lite v1.11.0. It gained 2.5 times the performance.
https://github.com/PINTO0309/Tensorflow-bin/blob/master/tensorflow-1.11.0-cp35-cp35m-linux_armv7l_jemalloc_multithread.whl
Customize “tensorflow/contrib/lite/examples/python/label_image.py”.
Environment Preparation for MobileNet v1.
Result of x1 Thread.
Result of x4 Thread.
I am referring to the suggestion of @freedomtan completely. Thank you, freedomtan. I found that activating MPI is meaningful for performance improvement, so now I started recompiling. It will take about 3 days.
for multi-threading stuff, I sent a PR https://github.com/tensorflow/tensorflow/pull/25748
@gasparka
I tried installing the rebuilt binary with “jemalloc” and “MPI” enabled. Unfortunately, I did not get faster as I expected. “MPI” seems to be a mechanism to speed up by distributed processing at learning.
【My ENet】 Pure Tensorflow v1.11.0 10.2 sec —> 9.5 sec 【My UNet】 Tensorflow Lite v1.11.0 11.5 sec —> 12.1 sec
https://github.com/PINTO0309/Tensorflow-bin.git tensorflow-1.11.0-cp35-cp35m-linux_armv7l_jemalloc_mpi.whl
Next I will try to validate “XLA JIT” and I will try to verify whether to speed up. I hope it will work…
@rky0930 , @EmilioMezaE see my previous comments for the reason and build instructions
@PINTO0309 and @masterchop as far as I can remember only the convolution kernel is multithreaded, so you hit Amdahl’s law
@masterchop
Yes. The above performance measurement result is based on RaspberryPi3.
25%-30%
It seems that you are misunderstanding something. “freedomtan” and my implementation is “MultiThread”. It is not “MultiProcess”. Performance will never improve more than 4 times. 4 Core is never used in full. http://www.dabeaz.com/python/UnderstandingGIL.pdf https://qiita.com/pumbaacave/items/942f86269b2c56313c15
If you need implementation with 4 cores, implement it yourself. I am sorry, I do not have the technology of implementation by “C++ and MultiProcess”.
Thank you for always doing great work, @freedomtan. I succeeded in building Tensorflow Lite, incorporating your suggestion. https://github.com/tensorflow/tensorflow/issues/25120#issuecomment-464401990
We got ours to work by updating interpreter.py to include contrib in the path as follows: _interpreter_wrapper = LazyLoader( “_interpreter_wrapper”, globals(), “tensorflow.contrib.lite.python.interpreter_wrapper.” “tensorflow_wrap_interpreter_wrapper”)
pylint: enable=g-inconsistent-quotes
@gasparka I tried rebuilding with MultiThread enabled. However, it seems that Python’s Wrapper does not refer to “Thread Count”, and the processing speed has not changed. If it is a C++ program, 4 threads will be used. Since I do not have the skills to write C++ programs, can you try? —> gasparka https://github.com/PINTO0309/Tensorflow-bin.git tensorflow-1.11.0-cp35-cp35m-linux_armv7l_jemalloc_mpi_multithread.whl
Results of Python program. 【My ENet】 Pure Tensorflow v1.11.0 9.5 sec —> 9.5 sec 【My UNet】 Tensorflow Lite v1.11.0 12.1 sec —> 12.5 sec
Next I will try to validate “XLA JIT” and I will try to verify whether to speed up.
@PINTO0309 Have you experimented with the thread count? I see that Lite is stuck on one thread, there is a C++ API for this but nothing in Python.
Could try to hardcode the thread count to 4: https://github.com/tensorflow/tensorflow/blob/1084594657a5d139102ac794f84d1427a710e39a/tensorflow/contrib/lite/interpreter.cc#L127
You’re welcome @rky0930 ! I’ sorry but i don’t know the reason of this problem, i just saw this page https://github.com/PINTO0309/Tensorflow-bin and followed the process
@gasparka My solution is to disable “jemalloc”. https://github.com/PINTO0309/Tensorflow-bin.git Although I have not tried it yet, enabling “jemalloc” may improve performance.
@sahilparekh for 1.9.x to 1.11.x, what I posted in Aug,
should work. For master branch, some modification for building AWS SDK may be needed. AWS SDK problem may need something like https://github.com/tensorflow/tensorflow/pull/22856