openvino: Inference 30% slower with 2020.2/3 than 2019 R3 on Intel Xeon/win10

This issue has been moved from https://github.com/opencv/opencv/issues/17283 My own c++ benchmarks on a Xeon E5-1620/Win10 with OpenCV reflect the same numbers than with the official benchmark_app.py. Model is a resnet18 trained with https://github.com/qubvel/segmentation_models.pytorch exported to onnx and then sent to the model optimizer. I can share the onnx or bin/xml by email if needed.

2019.R3: 50 inferences on CPU OpenVino..done in avg: 63.2, min: 60, median: 61, std: 6.07
2020.2 : 50 inferences on CPU OpenVino..done in avg: 87.1, min: 85, median: 86, std: 2.12

python "C:\Program Files (x86)\IntelSWTools\openvino_2019.3.379\deployment_tools\tools\benchmark_tool\benchmark_app.py" -m models\model_2020_0427_19h13_9.xml -l "C:\Program Files (x86)\IntelSWTools\openvino_2019.3.379\deployment_tools\inference_engine\bin\intel64\Release\cpu_extension_avx2.dll" -api sync

**** 2019.R3 ****
[ INFO ] InferenceEngine:
         API version............. 2.1.32974
[ INFO ] Device info
         CPU
         MKLDNNPlugin............ version 2.1
         Build................... 32974
[ INFO ] Network batch size: 1, precision: FP32
[ INFO ] Network input 'input.1' precision FP32, dimensions (NCHW): 1 1 256 256
[Step 10/11] Measuring performance (Start inference syncronously, limits: 60000 ms duration)
[Step 11/11] Dumping statistics report
Count:      948 iterations
Duration:   60077.80 ms
Latency:    62.10 ms
Throughput: 16.10 FPS

python "C:\Program Files (x86)\IntelSWTools\openvino_2020.2.117\deployment_tools\tools\benchmark_tool\benchmark_app.py" -m models\model_2020_0427_19h13_9.xml -api sync

**** 2020.2 ****
[ INFO ] InferenceEngine:
         API version............. 2.1.42025
[ INFO ] Device info
         CPU
         MKLDNNPlugin............ version 2.1
         Build................... 42025
[ INFO ] Network batch size: 1
[ INFO ] Network input 'input.1' precision FP32, dimensions (NCHW): 1 1 256 256
[Step 10/11] Measuring performance (Start inference syncronously, limits: 60000 ms duration)
[Step 11/11] Dumping statistics report
Count:      685 iterations
Duration:   60052.85 ms
Latency:    87.23 ms
Throughput: 11.46 FPS

About this issue

Original URL
State: closed
Created 4 years ago
Comments: 23 (11 by maintainers)

Commits related to this issue

[ARM CPU Plugin] Fix Broadcast target shape in comparison converter (#501) * Update convert_comparison.cpp * Update convert_comparison.cpp * Update convert_comparison.cpp — committed to redradist/openvino by alvoron 2 years ago

Most upvoted comments

@JulienMaille Sure, please email them to my email on my profile and I will report back on my findings.

jgespino on Mar 4, 2022

@JulienMaille Thank you. Information you provided is enough. We are trying to reproduce the issue right now. I will update once we will have some news.

dmitry-gorokhov on Jun 4, 2020

@JulienMaille Thanks!

nkogteva on May 29, 2020

@JulienMaille Could you please add -pc parameter to command line to see per-layer timings for both releases?

nkogteva on May 27, 2020