tensorflow: tflite gpu delegate create and load model use v2 api is very slow compare with v1 api(10x) why ?

System information

Have I written custom code (as opposed to using a stock example script provided in TensorFlow):
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): android
Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device:
TensorFlow installed from (source or binary):
TensorFlow version (use command below): 2.2 rc2
Python version:
Bazel version (if compiling from source):
GCC/Compiler version (if compiling from source):
CUDA/cuDNN version:
GPU model and memory:

i compile tflite.a (2.2 rc2) from source and use ndk c++ api to run tflite model as follow: ` #ifdef V2 TfLiteGpuDelegateOptionsV2 tOptions = TfLiteGpuDelegateOptionsV2Default(); if (m_bGPUAllowFP16) { tOptions.is_precision_loss_allowed = 1; } tOptions.inference_preference = 1;

m_pGPUDelegate = TfLiteGpuDelegateV2Create(&tOptions);

#else TfLiteGpuDelegateOptions tOptions = {.metadata = nullptr, .compile_options = {.precision_loss_allowed = 0, .preferred_gl_object_type = TFLITE_GL_OBJECT_TYPE_FASTEST, .dynamic_batch_enabled = 0,},}; if (m_bGPUAllowFP16) { tOptions.compile_options.precision_loss_allowed = 1; }

m_pGPUDelegate = TfLiteGpuDelegateCreate(&tOptions);

#endif

auto iRetCode = m_pInterp->ModifyGraphWithDelegate(m_pGPUDelegate);
if (iRetCode != kTfLiteOk)
{
    return -1;
}

but the time cost is very different, v1 load time cost is only 10% of v2 load time. the model has Conv2DTranspose op, if use v1 api the inference time is 4x of v2 api, so why has this performance different?

About this issue

Original URL
State: closed
Created 4 years ago
Comments: 34 (14 by maintainers)

Most upvoted comments

Is there a way to precompile and cache the OpenCL kernels? Maybe as part of the tflite export? Or maybe the GPU delegate can cache the kernels locally on the device for subsequent runs?

tailsu on Jul 23, 2020

Is there a way to precompile and cache the OpenCL kernels? Maybe as part of the tflite export? Or maybe the GPU delegate can cache the kernels locally on the device for subsequent runs?

duanshanchong on Nov 30, 2020

While this functionality is present (was just added about a month ago), it doesn’t follow the paths of the GPU delegate, and there is a lot more involved in getting the plumbing done. You can take a look at tensorflow/lite/delegates/gpu/cl/serialization.h and make your experiment with it (sorry, no official documentation or support). Note that the generated cache binary is not universal, i.e. it differs by mobile vendor, device, OS version, and GPU driver. So for each new model, you would have to run & generate it once on that particular user device and store it. You also need to know when you’re going to flush it with a new OS version, GPU driver, and your ML model, so you’ll need quite a lot of logic around this.

@impjdi hi impjdi, could you give a example? it’s very useful to save init time!! mycode as follows, but i don’t how to use Encode API in serialization.h.
Thanks!

    // Load model
    std::unique_ptr<tflite::FlatBufferModel> model =
      tflite::FlatBufferModel::BuildFromFile(model_file);

    tflite::ops::builtin::BuiltinOpResolver resolver;
    tflite::InterpreterBuilder builder(*model, resolver);
    builder(&_interpreter);

    TfLiteGpuDelegateOptionsV2 gpu_opts = TfLiteGpuDelegateOptionsV2Default();
    auto delegate = CreateGPUDelegate(&gpu_opts);

    _interpreter->ModifyGraphWithDelegate(delegate.get());

menchunlei on Jan 20, 2021

While this functionality is present (was just added about a month ago), it doesn’t follow the paths of the GPU delegate, and there is a lot more involved in getting the plumbing done. You can take a look at tensorflow/lite/delegates/gpu/cl/serialization.h and make your experiment with it (sorry, no official documentation or support). Note that the generated cache binary is not universal, i.e. it differs by mobile vendor, device, OS version, and GPU driver. So for each new model, you would have to run & generate it once on that particular user device and store it. You also need to know when you’re going to flush it with a new OS version, GPU driver, and your ML model, so you’ll need quite a lot of logic around this.

impjdi on Nov 30, 2020