gluon-cv: MackRCNN C++ Deployment Not Working in GPU Mode

Hi,

Thanks for the great work! I am pretty interested in C++ deployment so just tried for object detection, and then used maskrcnn for detection to see if it works.

Setting: MXNet: 1.3 GPU: 1080Ti System: Ubuntu 16.04 Gluon CV: master branch image: dog.jpg in object detection tutorial. code: cpp-inference, built bin gluon-detect.

The faster-rcnn, ssd and yolov3 work well in both cpu and gpu version, with nice standard speed. The first frame is slow for warming gpu so the speed printed on screen doesn’t matter.

./gluoncv-detect ../../export/faster_rcnn_resnet50_v1b_voc dog.jpg --gpu 0
[11:03:33] /home/chongzhao/gluon-cv/scripts/deployment/cpp-inference/src/detect.cpp:98: Using Pascal VOC names...
[11:03:33] /home/chongzhao/gluon-cv/scripts/deployment/cpp-inference/src/detect.cpp:120: Using GPU(0)...
[11:03:35] /home/chongzhao/gluon-cv/scripts/deployment/cpp-inference/src/detect.cpp:134: Image shape: 640 x 480
[11:03:41] /home/chongzhao/gluon-cv/scripts/deployment/cpp-inference/src/detect.cpp:155: Elapsed time {Forward->Result}: 5138.46 ms
[11:03:44] /home/chongzhao/gluon-cv/scripts/deployment/cpp-inference/src/common.hpp:177: Start Ploting with visualize score threshold: 0.3
[11:03:44] /home/chongzhao/gluon-cv/scripts/deployment/cpp-inference/src/common.hpp:210: id: bicycle, scores: 0.951641
[11:03:44] /home/chongzhao/gluon-cv/scripts/deployment/cpp-inference/src/common.hpp:210: id: car, scores: 0.999761
[11:03:44] /home/chongzhao/gluon-cv/scripts/deployment/cpp-inference/src/common.hpp:210: id: dog, scores: 0.999168
[11:03:44] /home/chongzhao/gluon-cv/scripts/deployment/cpp-inference/src/common.hpp:210: id: motorbike, scores: 0.301618
[11:03:44] /home/chongzhao/gluon-cv/scripts/deployment/cpp-inference/src/common.hpp:210: id: pottedplant, scores: 0.373259
[11:03:44] /home/chongzhao/gluon-cv/scripts/deployment/cpp-inference/src/common.hpp:210: id: pottedplant, scores: 0.370312

./gluoncv-detect ../../export/ssd_512_resnet50_v1_voc dog.jpg --gpu 0
[10:26:46] /home/chongzhao/gluon-cv/scripts/deployment/cpp-inference/src/detect.cpp:98: Using Pascal VOC names...
[10:26:46] /home/chongzhao/gluon-cv/scripts/deployment/cpp-inference/src/detect.cpp:120: Using GPU(0)...
[10:26:49] /home/chongzhao/gluon-cv/scripts/deployment/cpp-inference/src/detect.cpp:134: Image shape: 640 x 480
[10:26:54] src/operator/nn/./cudnn/./cudnn_algoreg-inl.h:109: Running performance tests to find the best convolution algorithm, this can take a while... (setting env variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0 to disable)
[10:27:07] /home/chongzhao/gluon-cv/scripts/deployment/cpp-inference/src/detect.cpp:154: Elapsed time {Forward->Result}: 17022.7 ms
[10:27:07] /home/chongzhao/gluon-cv/scripts/deployment/cpp-inference/src/common.hpp:177: Start Ploting with visualize score threshold: 0.3
[10:27:07] /home/chongzhao/gluon-cv/scripts/deployment/cpp-inference/src/common.hpp:210: id: car, scores: 0.998562
[10:27:07] /home/chongzhao/gluon-cv/scripts/deployment/cpp-inference/src/common.hpp:210: id: dog, scores: 0.990034
[10:27:07] /home/chongzhao/gluon-cv/scripts/deployment/cpp-inference/src/common.hpp:210: id: bicycle, scores: 0.984405

./gluoncv-detect ../../export/yolo3_darknet53_voc dog.jpg 
[10:21:31] /home/chongzhao/gluon-cv/scripts/deployment/cpp-inference/src/detect.cpp:98: Using Pascal VOC names...
[10:21:31] /home/chongzhao/gluon-cv/scripts/deployment/cpp-inference/src/detect.cpp:134: Image shape: 640 x 480
[10:21:36] /home/chongzhao/gluon-cv/scripts/deployment/cpp-inference/src/detect.cpp:154: Elapsed time {Forward->Result}: 4435.19 ms
[10:21:36] /home/chongzhao/gluon-cv/scripts/deployment/cpp-inference/src/common.hpp:177: Start Ploting with visualize score threshold: 0.3
[10:21:36] /home/chongzhao/gluon-cv/scripts/deployment/cpp-inference/src/common.hpp:210: id: car, scores: 0.996979
[10:21:36] /home/chongzhao/gluon-cv/scripts/deployment/cpp-inference/src/common.hpp:210: id: dog, scores: 0.995044
[10:21:36] /home/chongzhao/gluon-cv/scripts/deployment/cpp-inference/src/common.hpp:210: id: bicycle, scores: 0.991447

When I try maskrcnn, in CPU mode it works well, though slow.

./gluoncv-detect ../../export/mask_rcnn_resnet50_v1b_coco dog.jpg 
[10:22:26] /home/chongzhao/gluon-cv/scripts/deployment/cpp-inference/src/detect.cpp:103: Using COCO names...
[10:22:27] /home/chongzhao/gluon-cv/scripts/deployment/cpp-inference/src/detect.cpp:134: Image shape: 640 x 480

[10:23:53] /home/chongzhao/gluon-cv/scripts/deployment/cpp-inference/src/detect.cpp:154: Elapsed time {Forward->Result}: 86116.8 ms
[10:23:53] /home/chongzhao/gluon-cv/scripts/deployment/cpp-inference/src/common.hpp:177: Start Ploting with visualize score threshold: 0.3
[10:23:53] /home/chongzhao/gluon-cv/scripts/deployment/cpp-inference/src/common.hpp:210: id: bicycle, scores: 0.9996
[10:23:53] /home/chongzhao/gluon-cv/scripts/deployment/cpp-inference/src/common.hpp:210: id: dog, scores: 0.997231
[10:23:53] /home/chongzhao/gluon-cv/scripts/deployment/cpp-inference/src/common.hpp:210: id: cat, scores: 0.780812
[10:23:53] /home/chongzhao/gluon-cv/scripts/deployment/cpp-inference/src/common.hpp:210: id: car, scores: 0.739689
[10:23:53] /home/chongzhao/gluon-cv/scripts/deployment/cpp-inference/src/common.hpp:210: id: bicycle, scores: 0.678044
[10:23:53] /home/chongzhao/gluon-cv/scripts/deployment/cpp-inference/src/common.hpp:210: id: bicycle, scores: 0.535232
[10:23:53] /home/chongzhao/gluon-cv/scripts/deployment/cpp-inference/src/common.hpp:210: id: truck, scores: 0.41138
[10:23:53] /home/chongzhao/gluon-cv/scripts/deployment/cpp-inference/src/common.hpp:210: id: potted plant, scores: 0.360993

While in GPU mode, it failed somehow:

./gluoncv-detect ../../export/mask_rcnn_resnet50_v1b_coco dog.jpg --gpu 0
[10:24:58] /home/chongzhao/gluon-cv/scripts/deployment/cpp-inference/src/detect.cpp:103: Using COCO names...
[10:24:58] /home/chongzhao/gluon-cv/scripts/deployment/cpp-inference/src/detect.cpp:120: Using GPU(0)...
[10:25:02] /home/chongzhao/gluon-cv/scripts/deployment/cpp-inference/src/detect.cpp:134: Image shape: 640 x 480
[10:25:06] src/operator/nn/./cudnn/./cudnn_algoreg-inl.h:109: Running performance tests to find the best convolution algorithm, this can take a while... (setting env variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0 to disable)
[10:25:20] /home/chongzhao/gluon-cv/scripts/deployment/cpp-inference/src/detect.cpp:154: Elapsed time {Forward->Result}: 18267.5 ms
terminate called after throwing an instance of 'dmlc::Error'
  what():  [10:25:20] /home/chongzhao/mxnet-1.3/cpp-package/include/mxnet-cpp/ndarray.hpp:236: Check failed: MXNDArrayWaitToRead(blob_ptr_->handle_) == 0 (-1 vs. 0) 

Stack trace returned 8 entries:
[bt] (0) ./gluoncv-detect(dmlc::StackTrace[abi:cxx11]()+0x54) [0x4af2cc]
[bt] (1) ./gluoncv-detect(dmlc::LogMessageFatal::~LogMessageFatal()+0x2a) [0x4af598]
[bt] (2) ./gluoncv-detect(mxnet::cpp::NDArray::WaitToRead() const+0xca) [0x4b3c96]
[bt] (3) ./gluoncv-detect(viz::PlotBbox(cv::Mat, mxnet::cpp::NDArray, mxnet::cpp::NDArray, mxnet::cpp::NDArray, float, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::map<int, cv::Scalar_<double>, std::less<int>, std::allocator<std::pair<int const, cv::Scalar_<double> > > >, bool)+0x124) [0x4bbb38]
[bt] (4) ./gluoncv-detect(RunDemo()+0x693) [0x4ab38b]
[bt] (5) ./gluoncv-detect(main+0x25) [0x4ab82d]
[bt] (6) /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7fbeeddf6830]
[bt] (7) ./gluoncv-detect(_start+0x29) [0x4a8819]


Aborted (core dumped)

Is there anything I can work on to solve the problem? Or any suggestions? Thanks a lot!

About this issue

Original URL
State: closed
Created 6 years ago
Comments: 20 (12 by maintainers)

Most upvoted comments

Just decrease the rpn_test_pre_nms and rpn_test_post_nms before export the model.

I set rpn_test_pre_nms = 600, default 6000, rpn_test_post_nms = 100, default 1000 and then export to cpp. It only takes about 3GB gpu memory using resnet50_mask_rcnn.

freealong on Sep 6, 2019

@xxradon Thanks a lot for the point. @zhreshold The bad news is that it still doesn’t work when I test it. I even lower the values of those three params to 100, 100, and 10, but still get error message the same as previously I have.

I think though these params might affect the performance and requirements for the hardware. But if python wrapper can run smoothly, there should be no way that cpp works worse than python. That’s why I think the reason comes from somewhere in cpp mechanism.

kuonangzhe on Oct 16, 2018