tensorflow: TFLite GPU Delegate will block the thread who is calling interpreter.run()
System information
- OS Platform and Distribution: Linux Ubuntu 16.04
- Mobile device: OnePlus 3, One Plus 5 and Pixel 2 XL
- TensorFlow Lite version on Android: 0.0.0-gpu-experimental
- Have I written custom code: a GitHub repo contains the codes to reproduce the issue. https://github.com/dailystudio/ml/tree/master/deeplab
- DeepLab v3 TFLite model: DeepLab segmentation (257x257)
Describe the current behavior Using the following code snippet to create an Interpreter with GPU delegate
Interpreter.Options options = new Interpreter.Options();
GpuDelegate delegate = new GpuDelegate();
options.addDelegate(delegate);
Interpreter interpreter = new Interpreter(mModelBuffer, options);
Calling the run() of the Interpreter with following lines of codes:
interpreter.run(mImageData, mOutputs);
If these two code snippets are called in two different threads, the thread which calls interpreter.run() will be blocked. interpreter.run() will never return. If these two code snippets are called in the same thread, interpreter.run() will be executed properly and output correct results.
Describe the expected behavior Developers needn’t care about which threads are used for calling these APIs. Even these APIs are called in different threads, interpreter.run() should return correctly with blocking issue.
Code to reproduce the issue The full code can be found here: https://github.com/dailystudio/ml/blob/master/deeplab/app/src/main/java/com/dailystudio/deeplab/ml/DeepLabLite.java Currently, the code in repository works fine because the new Interpreter() and interpreter.run() are called in the same thread. The DeepLabLite class has two important functions: initialize() and segment(). In intialize(), we read TFLite model from asset/ directory into a MappedByteBuffer:
@Override
public boolean initialize(Context context) {
if (context == null) {
return false;
}
mModelBuffer = loadModelFile(context, MODEL_PATH);
if (mModelBuffer == null) {
return false;
}
...
}
In segment(), we use that MappedByteBuffer to create an Interpreter and call run() for inference:
...
Interpreter.Options options = new Interpreter.Options();
if (USE_GPU) {
GpuDelegate delegate = new GpuDelegate();
options.addDelegate(delegate);
}
Interpreter interpreter = new Interpreter(mModelBuffer, options);
...
final long start = System.currentTimeMillis();
interpreter.run(mImageData, mOutputs);
final long end = System.currentTimeMillis();
...
The DeepLabLite.initialize() is called in an AsyncTask after application is launched, while the DeepLabLite.segment() is called a Loader after users pick an image for segmentation. These codes will be no problem.
But if we keep the codes of calling these two methods unchanged and move the following line from segment() to initialize():
Interpreter interpreter = new Interpreter(mModelBuffer, options);
P.S.: Of course, we need to declare a class member to hold this Interpreter for future using in segment().
Then the calling of interpreter.run() will be blocked forever.
Other information With my tests, I suspect this problem is independent of devices. It would happen on all Android devices. It should be related to GpuDelegate. If you do not call options.addDelegate() to add a GpuDelegate, the interpreter.run() will also run well.
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 3
- Comments: 18 (9 by maintainers)
Hmm…, but I am not using OpenGL in my demo application. I understand your points, but from my point of view, as a developer who is using TFLite, I only care about how to use your APIs to achieve my own goals. In my codes, there are no OpenGL related codes. So, let me learn and understand the concept of OpenGL contexts or multi-threads architectures seems like an imposition. I think my case is quite typical. First, you load a model from assets and create an Interpreter for using in the future. Then when you really need it, you call run() to inference the results. Calling these two steps in the main thread is absolutely not acceptable, especially in a real product. That means in most cases, these two steps will be called in different threads and probably be called in two different threads You couldn’t suppose every developer who uses TFLite will have acknowledgment about everything. If I am using OpenGL to write the application, yes, I may be aware that there would be some multi-thread issues. But I am just a developer who is writing a standard application which is using TF to segment image. To be honest, I had used an entire afternoon to find out the root cause of this issue. Just because I am quite interested and have enthusiastic in Tensorflow. I am not challenging the work what you have already done. I just want it to be better and can be accepted by more people. My suggestions are:
Thanks for your patient to pay attention to my issue and I hope my advice could help the TFLite become better in the future.
Not officially announced yet, but FYI: GPU code is now visible at:
https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/delegates/gpu
if you need the code for better insight what is happening.
@dailystudio
That’s actually pretty good feedback and one of the reason’s why we put out a developer “preview” to gather feedback like yours. We really appreciate it.
The fact that you have to be mindful of the GL context is inevitable, especially when you work in multithreaded settings. We actually tried our best to hide that away from the users (and that’s why you don’t see the GL context in the API), but maybe hiding that was a bad thing. If the API requires you to provide the GL context (or maybe the thread that owns the GL context), maybe that might have been better.
That’s a great idea. We’ll see how we can add that check without losing performance. Doing a GL context check before every
runInference
is probably not the right way to go 😉Will do.
For now, I guess the easiest trick you can employ (if you want to go down the path of multithreading) is to have a dedicated thread that does initialization and inference all there, and you send a signal to the thread to run the inference.