mace: CL_INVALID_KERNEL_ARGS

Before you open an issue, please make sure you have tried the following steps:

Make sure your environment is the same with (https://mace.readthedocs.io/en/latest/installation/env_requirement.html).
Have you ever read the document for your usage?
Check if your issue appears in HOW-TO-DEBUG or FAQ.
The form below must be filled.

System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 16.04
NDK version(e.g., 15c): 18b
GCC version(if compiling for host, e.g., 5.4.0): 5.4.0
MACE version (Use the command: git describe --long --tags): 0.13.0
Python version(2.7): 3.6
Bazel version (e.g., 0.13.0): 0.16.0
CMake version: 3.16.0

Model deploy file (*.yml)

# The name of library
library_name: test
target_abis: [arm64-v8a]
model_graph_format: code
model_data_format: code
models:
  FE: # model tag, which will be used in model loading and must be specific.
    platform: caffe
    # path to your tensorflow model's pb file. Support local path, http:// and https://
    model_file_path: /models/FE.prototxt
    weight_file_path: /models/FE.caffemodel
    # sha256_checksum of your model's pb file.
    # use this command to get the sha256_checksum --> sha256sum path/to/your/pb/file
    model_sha256_checksum: 98f9b69a085e7d8f40704ac6b2fedae0fda876fff4658509dde3d74d883a9684
    weight_sha256_checksum: a9f5d4dfe944315511c6070e8556790409ae0f0bd9005c5db66b4fdd5c38b716
    subgraphs:
      - input_tensors:
          - data
        input_shapes:
          - 1,3,112,112
        input_data_formats:
          - NCHW
        output_tensors:
          - fc1
        output_shapes:
          - 1,1,1,512
    obfuscate: 0
    limit_opencl_kernel_time: 1
    runtime: cpu+gpu
    winograd: 4

  FD:
    platform: caffe
    model_file_path: /models/FD.prototxt
    weight_file_path: /models/FD.caffemodel
    model_sha256_checksum: 213d764bd605d02b1630740969ab7110a2ee0111e3f8200ce02304cf72fbd42a
    weight_sha256_checksum: c83d575645daf8541867a63197de6bfd44a7fb3bf9bf4c876cde8165c23fac0c
    subgraphs:
      - input_tensors:
          - data
        input_shapes:
          - 1,3,160,160
        input_data_formats:
          - NCHW
        output_tensors:
          - face_rpn_cls_prob_reshape_stride32
          - face_rpn_bbox_pred_stride32
          - face_rpn_landmark_pred_stride32
          - face_rpn_cls_prob_reshape_stride16
          - face_rpn_bbox_pred_stride16
          - face_rpn_landmark_pred_stride16
          - face_rpn_cls_prob_reshape_stride8
          - face_rpn_bbox_pred_stride8
          - face_rpn_landmark_pred_stride8
        output_shapes:
          - 1,4,5,5
          - 1,8,5,5
          - 1,20,5,5
          - 1,4,10,10
          - 1,8,10,10
          - 1,20,10,10
          - 1,4,20,20
          - 1,8,20,20
          - 1,20,20,20
        output_data_formats:
          - NCHW
          - NCHW
          - NCHW
          - NCHW
          - NCHW
          - NCHW
          - NCHW
          - NCHW
          - NCHW
    obfuscate: 0
    runtime: cpu+gpu
    winograd: 0

Describe the problem

When integrating compiled mace library and model into and android app, I get CL_INVALID_KERNEL_ARGS error at runtime, followed by some Out of resources errors.

Any clue about what can cause this kind of error?

To Reproduce

Steps to reproduce the problem:

1. cd /path/to/mace
2. python tools/converter.py convert --config_file=/path/to/your/model_deployment_file
2. python tools/converter.py run --validate --disable_tuning --config_file=/path/to/your/model_deployment_file
3. run android app

Error information / logs

Please include the full log and/or traceback here.

E/MACE: helper.cc:201 error: CL_INVALID_KERNEL_ARGS
E/MACE: helper.cc:246 error: CL_INVALID_KERNEL_ARGS
I/MACE: activation.cc:113 TuningOrRun3DKernel(runtime, kernel_, tuning_key, gws, lws, context->future()) failed with error: Out of resources
I/MACE: net.cc:152 op->Run(&context) failed with error: Out of resources
I/MACE: mace.cc:890 net_->Run(run_metadata) failed with error: Out of resources

Additional context

Model to reproduce the issue can be found here

About this issue

Original URL
State: closed
Created 4 years ago
Comments: 15 (8 by maintainers)

Most upvoted comments

@lu229 That’s not my case. I’m using multiple threads and I have 1 thread per model and each model has a separate engine, so it should be fine.

What I’ve tried so far is to mutex lock the method that creates the GPU context and runs warmup, so that only one model can access it and other models from other threads have to wait for the mutex to be released before getting initialized. This fix seems to work, but I wanna test it more.

gasgallo on May 28, 2020

OK, Thanks! I will analyze the code and try to find the problem.

lu229 on May 22, 2020