mace: Caffe model validation fails on MACE v0.12.0 due to low similarity
System information
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu16.04 (MACE Docker image)
- NDK version(e.g., 15c): 18b
- GCC version(if compiling for host, e.g., 5.4.0): gcc (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609
- MACE version (Use the command: git describe --long --tags): 0.12.0
- Python version(2.7): 3.6
- Bazel version (e.g., 0.13.0): [0.16.0]
- CMake version: 3.16.0
Model deploy file (*.yml)
# The name of library
library_name: libretinanet
target_abis: [arm64-v8a]
model_graph_format: code
model_data_format: code
models:
retinanet:
platform: caffe
model_file_path: /models/retinanet/retinanet3.prototxt
weight_file_path: /models/retinanet/retinanet3.caffemodel
model_sha256_checksum: 638e05fc466737c3b8fc36261adaaff40cbd4de5a8c72a46b37f2b00f01180e1
weight_sha256_checksum: 6222910a773c693c23b4765baba4ed8427e9f3c11781060918e6282a297a7437
subgraphs:
- input_tensors:
- data
input_shapes:
- 1,3,320,320
input_data_formats:
- NCHW
output_tensors:
- face_rpn_cls_prob_reshape_stride32
- face_rpn_bbox_pred_stride32
- face_rpn_landmark_pred_stride32
- face_rpn_cls_prob_reshape_stride16
- face_rpn_bbox_pred_stride16
- face_rpn_landmark_pred_stride16
- face_rpn_cls_prob_reshape_stride8
- face_rpn_bbox_pred_stride8
- face_rpn_landmark_pred_stride8
output_shapes:
- 1,4,10,10
- 1,8,10,10
- 1,20,10,10
- 1,4,20,20
- 1,8,20,20
- 1,20,20,20
- 1,4,40,40
- 1,8,40,40
- 1,20,40,40
output_data_formats:
- NCHW
- NCHW
- NCHW
- NCHW
- NCHW
- NCHW
- NCHW
- NCHW
- NCHW
obfuscate: 0
runtime: gpu
winograd: 4
Describe the problem
With MACE v0.12.0, the output of converted model is significantly different. As a result, it fails to run in validation mode. In addition, I tried to run unit tests on Andorid Studio and found the output difference.
However, with MACE v0.11.0-rc1, the output is fine and the validation runs successfully.
========================================================
capability(CPU) init warmup run_avg
========================================================
time 7.484 877.331 1323.142 16.371
I mace/libmace/mace.cc:636] Destroying MaceEngine
Running finished!
* Validate with caffe
Pull /data/local/tmp/mace_run/model_out_face_rpn_cls_prob_reshape_stride32 to build/libretinanet/_tmp/retinanet/14a062bf9f488e2a38c1fb60b18a80de/MI9_msmnile/arm64-v8a
Pull /data/local/tmp/mace_run/model_out_face_rpn_bbox_pred_stride32 to build/libretinanet/_tmp/retinanet/14a062bf9f488e2a38c1fb60b18a80de/MI9_msmnile/arm64-v8a
Pull /data/local/tmp/mace_run/model_out_face_rpn_landmark_pred_stride32 to build/libretinanet/_tmp/retinanet/14a062bf9f488e2a38c1fb60b18a80de/MI9_msmnile/arm64-v8a
Pull /data/local/tmp/mace_run/model_out_face_rpn_cls_prob_reshape_stride16 to build/libretinanet/_tmp/retinanet/14a062bf9f488e2a38c1fb60b18a80de/MI9_msmnile/arm64-v8a
Pull /data/local/tmp/mace_run/model_out_face_rpn_bbox_pred_stride16 to build/libretinanet/_tmp/retinanet/14a062bf9f488e2a38c1fb60b18a80de/MI9_msmnile/arm64-v8a
Pull /data/local/tmp/mace_run/model_out_face_rpn_landmark_pred_stride16 to build/libretinanet/_tmp/retinanet/14a062bf9f488e2a38c1fb60b18a80de/MI9_msmnile/arm64-v8a
Pull /data/local/tmp/mace_run/model_out_face_rpn_cls_prob_reshape_stride8 to build/libretinanet/_tmp/retinanet/14a062bf9f488e2a38c1fb60b18a80de/MI9_msmnile/arm64-v8a
Pull /data/local/tmp/mace_run/model_out_face_rpn_bbox_pred_stride8 to build/libretinanet/_tmp/retinanet/14a062bf9f488e2a38c1fb60b18a80de/MI9_msmnile/arm64-v8a
Pull /data/local/tmp/mace_run/model_out_face_rpn_landmark_pred_stride8 to build/libretinanet/_tmp/retinanet/14a062bf9f488e2a38c1fb60b18a80de/MI9_msmnile/arm64-v8a
Traceback (most recent call last):
File "/mace/validate.py", line 459, in <module>
face_rpn_cls_prob_reshape_stride32 MACE VS CAFFE similarity: 0.7051022200187764 , sqnr: 1.988659806883121 , pixel_accuracy: 0.4
FLAGS.log_file)
File "/mace/validate.py", line 371, in validate
validation_threshold, log_file)
File "/mace/validate.py", line 262, in validate_caffe_model
value, validation_threshold, log_file)
File "/mace/validate.py", line 113, in compare_output
"", common.StringFormatter.block("Similarity Test Failed"))
TypeError: summary() takes exactly 1 argument (2 given)
Traceback (most recent call last):
File "tools/converter.py", line 1151, in <module>
flags.func(flags)
File "tools/converter.py", line 938, in run_mace
device.run_specify_abi(flags, configs, target_abi)
File "/mace/tools/device.py", line 782, in run_specify_abi
log_file=log_file,
File "/mace/tools/sh_commands.py", line 756, in validate_model
_fg=True)
File "/root/.pyenv/versions/3.6.3/lib/python3.6/site-packages/sh.py", line 1413, in __call__
raise exc
sh.ErrorReturnCode_1:
RAN: /usr/bin/docker exec mace_caffe_lastest_validator python -u /mace/validate.py --platform=caffe --model_file=/mace/retinanet3.prototxt --weight_file=/mace/retinanet3.caffemodel --input_file=/mace/model_input --mace_out_file=/mace/model_out --device_type=GPU --input_node=data --output_node=face_rpn_cls_prob_reshape_stride32,face_rpn_bbox_pred_stride32,face_rpn_landmark_pred_stride32,face_rpn_cls_prob_reshape_stride16,face_rpn_bbox_pred_stride16,face_rpn_landmark_pred_stride16,face_rpn_cls_prob_reshape_stride8,face_rpn_bbox_pred_stride8,face_rpn_landmark_pred_stride8 --input_shape=1,3,320,320 --output_shape=1,4,10,10:1,8,10,10:1,20,10,10:1,4,20,20:1,8,20,20:1,20,20,20:1,4,40,40:1,8,40,40:1,20,40,40 --input_data_format=NCHW --output_data_format=NCHW,NCHW,NCHW,NCHW,NCHW,NCHW,NCHW,NCHW,NCHW --validation_threshold=0.995000 --input_data_type=float32 --backend=tensorflow --validation_outputs_data= --log_file=
STDOUT:
STDERR:
To Reproduce
Steps to reproduce the problem:
1. cd /path/to/mace
2. python tools/converter.py convert --config_file=/models/retinanet/retinanet3.yml
2. python tools/converter.py run --validate --config_file=/models/retinanet/retinanet3.yml
Error information / logs
Please refer to this gist link for full conversion and validation log: MACE v0.12.0 Error log - validation failed · GitHub
Additional context
For MACE v0.12.0, I followed a workaround from the last answer of this issue https://github.com/XiaoMi/mace/issues/560
diff --git a/tools/python/transform/transformer.py b/tools/python/transform/transformer.py
index bb9154f..bbf14b4 100644
--- a/tools/python/transform/transformer.py
+++ b/tools/python/transform/transformer.py
@@ -1353,7 +1353,7 @@ class Transformer(base_converter.ConverterInterface):
visited = set()
sorted_nodes = []
- output_nodes = self._option.check_nodes.keys()
+ output_nodes = list(self._option.check_nodes.keys())
if not self._quantize_activation_info:
output_nodes.extend(self._option.output_nodes)
for output_node in output_nodes:
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 21 (8 by maintainers)
@gasgallo Sorry, Perhaps tomorrow.
We can try to fix this issue at https://github.com/XiaoMi/mace/pull/611
@gasgallo Thank you, Both v0.11.0-rc4 and v0.11.0-rc1 can support your model correctly. The file contains the bug is “/mace/mace/ops/opencl/image/reshape.cc”, When the model is from Caffe, we should transform the data from NHWC to NCHW format before the
Resizeinvoke, after theResizeinvoke, we should transform the data form NCHW to NHWC format.@mexeniz It seems that MACE has a bug in the support of the Split layer from Caffe, I’m busy these days, you can debug it yourself (reference this)or I’ll check it out in few days.
@mexeniz Sorry for late reply, please use the v0.12.0 and apply this patch. with v0.12.0 I have checked your model and found no errors.