tensorflow: Missing Operations for TfLite GPU Delegate
System information
- TensorFlow version (you are using): on Android: ‘org.tensorflow:tensorflow-lite:0.0.0-nightly’ and ‘org.tensorflow:tensorflow-lite-gpu:0.0.0-nightly’
- Are you willing to contribute it (Yes/No): Yes to the best of my possibilities
Describe the feature and the current behavior/state. I’m trying to use the GPU delegate for my custom tflite model. Creating the interpreter with the GPU delegate using this code:
val options = Options()
options.setUseNNAPI(false)
options.setAllowFp16PrecisionForFp32(true)
options.setNumThreads(NUM_THREADS)
val gpuDelegate = GpuDelegate()
options.addDelegate(gpuDelegate)
d.tfLite = Interpreter( loadModelFile(assetManager)!!, options)
results in my model being run normally, hence delivering the correct outputs but not being accelerated by the GPU in my opinion, since the execution time is exactly the same as without using the delegate. Adding the line d.tfLite!!.modifyGraphWithDelegate(gpuDelegate) (I don’t know if this is necessary, it would also be nice to know?) results in the following error:
java.lang.IllegalArgumentException: Internal error: Failed to apply delegate: Next operations are not supported by GPU delegate:
CONV_2D: Max version supported: 1. Requested version 2.
LOCAL_RESPONSE_NORMALIZATION: Operation is not supported.
SPLIT: Operation is not supported.
First 0 operations will run on the GPU, and the remaining 12 on the CPU.ModifyGraphWithDelegate is disallowed when graph is immutable.
java.lang.RuntimeException: java.lang.IllegalArgumentException: Internal error: Failed to apply delegate: Next operations are not supported by GPU delegate:
CONV_2D: Max version supported: 1. Requested version 2.
LOCAL_RESPONSE_NORMALIZATION: Operation is not supported.
SPLIT: Operation is not supported.
First 0 operations will run on the GPU, and the remaining 12 on the CPU.ModifyGraphWithDelegate is disallowed when graph is immutable.
So first of all: It would be nice to have those unsupported operations, i.e.
- CONV_2D v2
- LOCAL_RESPONSE_NORMALIZATION
- SPLIT
Second of all: what does ModifyGraphWithDelegate is disallowed when graph is immutable mean? Do I have to make any changes to my tflite model?
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 18 (11 by maintainers)
According to the documentation here: https://www.tensorflow.org/lite/performance/gpu_advanced none of the operations I listed above are supported, yet, so this is an ongoing issue for me.
@Noltibus
Sure thing. This is an open source project, and contributions are welcome 😃
Before you dig into the shader code, I would advise to understand the DHWC4 format (or PHWC4… we renamed the format at one point) first. It’s essentially slicing a HWC tensor into 4-channel slices; that’s used throughout the OpenGL delegate. I’m not so sure about the OpenCL delegate; it has more complicated formats.
Then, when it comes to the shader code, OpenGL delegate uses shader code generation to avoid code duplication, but unfortunately, it makes reading & understanding hard. OpenCL doesn’t use the code generation logic and might be easier to author one. After that, you want to also write some unit tests to make sure things work as intended.
Of course, the easiest is to find the closest op implementation and modifying that 😃