opencv: GPU not working with DNN_BACKEND_OPENCV

System information (version)
  • OpenCV => 4.1.2
  • Operating System / Platform => Windows 64 Bit
  • Compiler => Visual Studio 2017
  • Cuda => 10.2

Hello !

I use darknet Yolo for object detection and it works very well. Unfortunately with the CPU it’s very slow! I can make Darknet.exe work on the GPU but not in python.

net = cv2.dnn.readNet("dark/yolov3.weights", "dark/yolov3.cfg")
classes = []
with open("dark/coco.names", "r") as f:
    classes = [line.strip() for line in f.readlines()]
layer_names = net.getLayerNames()
output_layers = [layer_names[i[0] - 1] for i in net.getUnconnectedOutLayers()]
colors = np.random.uniform(0, 255, size=(len(classes), 3))

net.setPreferableBackend(cv2.dnn.DNN_BACKEND_OPENCV)
net.setPreferableTarget(cv2.dnn.DNN_TARGET_OPENCL_FP16)

Output :

OpenCV(ocl4dnn): consider to specify kernel configuration cache directory via OPENCV_OCL4DNN_CONFIG_PATH parameter. OpenCL program build log: dnn/dummy Status -11: CL_BUILD_PROGRAM_FAILURE -cl-no-subgroup-ifp Error in processing command line: Don’t understand command line argument “-cl-no-subgroup-ifp”!

The execution doesn’t crash but it’s the CPU that does the calculations.

Can u help ? Thx !

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 34 (12 by maintainers)

Most upvoted comments

I have run tests with adding nms_threshold=0 in the latest built openCV 4.4.

This was fixed in OpenCV 4.4.0. It’s required for OpenCV 4.3 and below.

@QBarbeAusy

I compiled opencv with CUDA 10.1 and Cudnn 8.0.2 and I also use yolov3 model with darknet and I hit around 13 fps with a 400x400 image.

There is a performance regression in cuDNN 8 that affects OpenCV and Darknet. I am working on a update that might get around the regression.

Related discussion at NVIDIA’s developer forums: https://forums.developer.nvidia.com/t/cudnn8-regression-in-algorithm-selection-heuristics/153667/3

I can clearly see that CUDA is only using few VRAM with nvidia-smi:

There is no reason why all the GPU memory needs to be consumed. More memory consumption does not imply it’s faster.

If downgrading to an older version of cuDNN does not fix, check if the comments in https://github.com/opencv/opencv/issues/17422 answer your question.