tensorflow: TFLite GPU Delegate will block the thread who is calling interpreter.run()

System information

OS Platform and Distribution: Linux Ubuntu 16.04
Mobile device: OnePlus 3, One Plus 5 and Pixel 2 XL
TensorFlow Lite version on Android: 0.0.0-gpu-experimental
Have I written custom code: a GitHub repo contains the codes to reproduce the issue. https://github.com/dailystudio/ml/tree/master/deeplab
DeepLab v3 TFLite model: DeepLab segmentation (257x257)

Describe the current behavior Using the following code snippet to create an Interpreter with GPU delegate

        Interpreter.Options options = new Interpreter.Options();

        GpuDelegate delegate = new GpuDelegate();
        options.addDelegate(delegate);
        

        Interpreter interpreter = new Interpreter(mModelBuffer, options);

Calling the run() of the Interpreter with following lines of codes:

        interpreter.run(mImageData, mOutputs);

If these two code snippets are called in two different threads, the thread which calls interpreter.run() will be blocked. interpreter.run() will never return. If these two code snippets are called in the same thread, interpreter.run() will be executed properly and output correct results.

Describe the expected behavior Developers needn’t care about which threads are used for calling these APIs. Even these APIs are called in different threads, interpreter.run() should return correctly with blocking issue.

Code to reproduce the issue The full code can be found here: https://github.com/dailystudio/ml/blob/master/deeplab/app/src/main/java/com/dailystudio/deeplab/ml/DeepLabLite.java Currently, the code in repository works fine because the new Interpreter() and interpreter.run() are called in the same thread. The DeepLabLite class has two important functions: initialize() and segment(). In intialize(), we read TFLite model from asset/ directory into a MappedByteBuffer:


    @Override
    public boolean initialize(Context context) {
        if (context == null) {
            return false;
        }

        mModelBuffer = loadModelFile(context, MODEL_PATH);
        if (mModelBuffer == null) {
            return false;
        }

        ...
    }

In segment(), we use that MappedByteBuffer to create an Interpreter and call run() for inference:

        ...
        Interpreter.Options options = new Interpreter.Options();

        if (USE_GPU) {
            GpuDelegate delegate = new GpuDelegate();
            options.addDelegate(delegate);
        }

        Interpreter interpreter = new Interpreter(mModelBuffer, options);
        ...
        final long start = System.currentTimeMillis();
        interpreter.run(mImageData, mOutputs);
        final long end = System.currentTimeMillis();
        ...

The DeepLabLite.initialize() is called in an AsyncTask after application is launched, while the DeepLabLite.segment() is called a Loader after users pick an image for segmentation. These codes will be no problem.
But if we keep the codes of calling these two methods unchanged and move the following line from segment() to initialize():

        Interpreter interpreter = new Interpreter(mModelBuffer, options);

P.S.: Of course, we need to declare a class member to hold this Interpreter for future using in segment().

Then the calling of interpreter.run() will be blocked forever.

Other information With my tests, I suspect this problem is independent of devices. It would happen on all Android devices. It should be related to GpuDelegate. If you do not call options.addDelegate() to add a GpuDelegate, the interpreter.run() will also run well.

About this issue

Original URL
State: closed
Created 5 years ago
Reactions: 3
Comments: 18 (9 by maintainers)

Links to this issue

Android opengl shader program to copy image from camera to SSBO for TF-lite GPU Inference - Stack Overflow

Most upvoted comments

@dailystudio

Developers needn’t care about which threads are used for calling these APIs.

Unfortunately, that is not the case when using OpenGL. OpenGL is a state machine, and a proper GL context needs to be kept around. This GL context is bound to the thread that it was created on.

https://www.khronos.org/opengl/wiki/OpenGL_and_multithreading

There are mechanisms to transfer GL contexts or use parent / children GL contexts for multithreaded architectures, but at the current level of developer preview, the exposed APIs probably don’t give you enough control to do this. Open sourcing GPU is just around the corner, at which point you will have more fine-grained control of the GPU processing with respect to your multithreaded programming model.

Hmm…, but I am not using OpenGL in my demo application. I understand your points, but from my point of view, as a developer who is using TFLite, I only care about how to use your APIs to achieve my own goals. In my codes, there are no OpenGL related codes. So, let me learn and understand the concept of OpenGL contexts or multi-threads architectures seems like an imposition. I think my case is quite typical. First, you load a model from assets and create an Interpreter for using in the future. Then when you really need it, you call run() to inference the results. Calling these two steps in the main thread is absolutely not acceptable, especially in a real product. That means in most cases, these two steps will be called in different threads and probably be called in two different threads You couldn’t suppose every developer who uses TFLite will have acknowledgment about everything. If I am using OpenGL to write the application, yes, I may be aware that there would be some multi-thread issues. But I am just a developer who is writing a standard application which is using TF to segment image. To be honest, I had used an entire afternoon to find out the root cause of this issue. Just because I am quite interested and have enthusiastic in Tensorflow. I am not challenging the work what you have already done. I just want it to be better and can be accepted by more people. My suggestions are:

You can handle the OpenGL context issues in the implementation of TFLite libraries. Of course, I am not an expert in this direction and that may be perfect but impossible. 😉
You can throw a runtime exception to warn the developer that they are using the API incorrectly and they should keep creation and inference in the same thread. But no matter which solution you finally decide to use, just simply adding some tips on the related section of the Tensorflow official website to tell the developers about this.

Thanks for your patient to pay attention to my issue and I hope my advice could help the TFLite become better in the future.

dailystudio on Feb 12, 2019

Not officially announced yet, but FYI: GPU code is now visible at:

https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/delegates/gpu

if you need the code for better insight what is happening.

impjdi on Mar 28, 2019

@dailystudio

That’s actually pretty good feedback and one of the reason’s why we put out a developer “preview” to gather feedback like yours. We really appreciate it.

You can handle the OpenGL context issues in the implementation of TFLite libraries.

The fact that you have to be mindful of the GL context is inevitable, especially when you work in multithreaded settings. We actually tried our best to hide that away from the users (and that’s why you don’t see the GL context in the API), but maybe hiding that was a bad thing. If the API requires you to provide the GL context (or maybe the thread that owns the GL context), maybe that might have been better.

You can throw a runtime exception to warn the developer that they are using the API incorrectly and they should keep creation and inference in the same thread.

That’s a great idea. We’ll see how we can add that check without losing performance. Doing a GL context check before every runInference is probably not the right way to go 😉

But no matter which solution you finally decide to use, just simply adding some tips on the related section of the Tensorflow official website to tell the developers about this.

Will do.

For now, I guess the easiest trick you can employ (if you want to go down the path of multithreading) is to have a dedicated thread that does initialization and inference all there, and you send a signal to the thread to run the inference.

impjdi on Feb 12, 2019