opencv: Segmentation fault with DNN_TARGET_OPENCL and DNN_TARGET_OPENCL_FP16 in custom OpenCL(VC4CL)
System information (version)
- OpenCV => Latest master(4.0.1-dev)
- Operating System / Platform => Raspbian Stretch
- Compiler => gcc-6.3.0 native
Detailed description
Hi, I’m experimenting with custom Implementation of OpenCL, VC4CL i.e. an implementation of the OpenCL 1.2 standard for the VideoCore IV GPU found on my Raspberry Pi. I’m experiencing segmentation fault when trying to utilize DNN_TARGET_OPENCL
and DNN_TARGET_OPENCL_FP16
i.e. net.setPreferableTarget
(to exploit its GPU for performance) with default yolov3(I need extra swap space to run it) prebuilt models to classify objects in an image. Here is the resultant error:(Similar error here)
pi@raspberrypi:~/yolo $ sudo python3 yolo3.py
load model time is 2.7082722187042236
OpenCV(ocl4dnn): consider to specify kernel configuration cache directory
via OPENCV_OCL4DNN_CONFIG_PATH parameter.
OpenCL program build log: dnn/dummy
Status -15: CL_COMPILE_PROGRAM_FAILURE
-cl-no-subgroup-ifp
[E] Mon Feb 18 16:02:55 2019: Errors in precompilation:
[E] Mon Feb 18 16:02:55 2019: error: unknown argument: '-cl-no-subgroup-ifp'
OpenCL program build log: dnn/conv_layer_spatial
Status -15: CL_COMPILE_PROGRAM_FAILURE
-D TYPE=1 -D Dtype=float -D Dtype2=float2 -D Dtype4=float4 -D Dtype8=float8 -D Dtype16=float16 -D as_Dtype=as_float -D as_Dtype2=as_float2 -D as_Dtype4=as_float4 -D as_Dtype8=as_float8 -D KERNEL_WIDTH=3 -D KERNEL_HEIGHT=3 -D STRIDE_X=1 -D STRIDE_Y=1 -D DILATION_X=1 -D DILATION_Y=1 -D KERNEL_BASIC -cl-fast-relaxed-math -D ConvolveBasic=BASIC_k3x3_cn3_g1_s1x1_d1x1_b1_in256x256_p1x1_num1_M16_activ1_eltwise0_FP32_4_1_1_1 -D CHANNELS=3 -D APPLY_BIAS=1 -D OUTPUT_Z=16 -D ZPAR=1 -D FUSED_CONV_RELU=1
[E] Mon Feb 18 16:02:55 2019: Errors in precompilation:
[E] Mon Feb 18 16:02:55 2019: error: PCH file uses a newer PCH format that cannot be read
1 error generated.
Failed to compile kernel: BASIC_k3x3_cn3_g1_s1x1_d1x1_b1_in256x256_p1x1_num1_M16_activ1_eltwise0_FP32_4_1_1_1, buildflags: -D TYPE=1 -D Dtype=float -D Dtype2=float2 -D Dtype4=float4 -D Dtype8=float8 -D Dtype16=float16 -D as_Dtype=as_float -D as_Dtype2=as_float2 -D as_Dtype4=as_float4 -D as_Dtype8=as_float8 -D KERNEL_WIDTH=3 -D KERNEL_HEIGHT=3 -D STRIDE_X=1 -D STRIDE_Y=1 -D DILATION_X=1 -D DILATION_Y=1 -D KERNEL_BASIC -cl-fast-relaxed-math -D ConvolveBasic=BASIC_k3x3_cn3_g1_s1x1_d1x1_b1_in256x256_p1x1_num1_M16_activ1_eltwise0_FP32_4_1_1_1 -D CHANNELS=3 -D APPLY_BIAS=1 -D OUTPUT_Z=16 -D ZPAR=1 -D FUSED_CONV_RELU=1, errmsg: [E] Mon Feb 18 16:02:55 2019: Errors in precompilation:
[E] Mon Feb 18 16:02:55 2019: error: PCH file uses a newer PCH format that cannot be read
1 error generated.
Segmentation fault
and
Here is the clinfo output of VC4CL’sOpenCL Implementation on my Raspberry Pi 3B rev 1.2:
pi@raspberrypi:~/yolo $ sudo clinfo
Number of platforms 1
Platform Name OpenCL for the Raspberry Pi VideoCore IV GPU
Platform Vendor doe300
Platform Version OpenCL 1.2 VC4CL 0.4
Platform Profile EMBEDDED_PROFILE
Platform Extensions cl_khr_il_program cl_khr_spir cl_altera_device_temperature cl_altera_live_object_tracking cl_khr_icd cl_vc4cl_performance_counters
Platform Extensions function suffix VC4CL
Platform Name OpenCL for the Raspberry Pi VideoCore IV GPU
Number of devices 1
Device Name VideoCore IV GPU
Device Vendor Broadcom
Device Vendor ID 0xa5c
Device Version OpenCL 1.2 VC4CL 0.4
Driver Version 0.4
Device OpenCL C Version OpenCL C 1.2
Device Type GPU
Device Profile EMBEDDED_PROFILE
Max compute units 1
Max clock frequency 350MHz
Device Partition (core)
Max number of sub-devices 0
Supported partition types None
Max work item dimensions 3
Max work item sizes 12x12x12
Max work group size 12
Preferred work group size multiple <getWGsizes:498: build program : error -15>
Preferred / native vector sizes
char 16 / 16
short 16 / 16
int 16 / 16
long 0 / 0
half 0 / 0 (n/a)
float 16 / 16
double 0 / 0 (n/a)
Half-precision Floating-point support (n/a)
Single-precision Floating-point support (core)
Denormals No
Infinity and NANs No
Round to nearest No
Round to zero Yes
Round to infinity No
IEEE754-2008 fused multiply-add No
Support is emulated in software No
Correctly-rounded divide and sqrt operations No
Double-precision Floating-point support (n/a)
Address bits 32, Little-Endian
Global memory size 335544320 (320MiB)
Error Correction support No
Max memory allocation 335544320 (320MiB)
Unified memory for Host and Device Yes
Minimum alignment for any data type 64 bytes
Alignment of base address 512 bits (64 bytes)
Global Memory cache type Read/Write
Global Memory cache size <printDeviceInfo:89: get CL_DEVICE_GLOBAL_MEM_CACHE_SIZE : error -30>
Global Memory cache line 64 bytes
Image support Yes
Max number of samplers per kernel 64
Max size for 1D images from buffer 2048 pixels
Max 1D or 2D image array size 2048 images
Max 2D image size 2048x2048 pixels
Max 3D image size 2048x2048x2048 pixels
Max number of read image args 64
Max number of write image args 64
Local memory type Global
Local memory size 335544320 (320MiB)
Max constant buffer size 335544320 (320MiB)
Max number of constant args 64
Max size of kernel argument 256
Queue properties
Out-of-order execution No
Profiling Yes
Prefer user sync for interop Yes
Profiling timer resolution 1ns
Execution capabilities
Run OpenCL kernels Yes
Run native kernels No
printf() buffer size 0
Built-in kernels
Device Available Yes
Compiler Available Yes
Linker Available No
Device Extensions cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_3d_image_writes cl_intel_packed_yuv cl_nv_pragma_unroll cl_arm_core_id cl_ext_atomic_counters_32 cl_khr_initialize_memory cl_arm_integer_dot_product_int8 cl_arm_integer_dot_product_accumulate_int8 cl_arm_integer_dot_product_accumulate_int16
NULL platform behavior
clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) OpenCL for the Raspberry Pi VideoCore IV GPU
clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) Success [VC4CL]
clCreateContext(NULL, ...) [default] Success [VC4CL]
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) Success (1)
Platform Name OpenCL for the Raspberry Pi VideoCore IV GPU
Device Name VideoCore IV GPU
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) Success (1)
Platform Name OpenCL for the Raspberry Pi VideoCore IV GPU
Device Name VideoCore IV GPU
ICD loader properties
ICD loader Name OpenCL ICD Loader
ICD loader Vendor OCL Icd free software
ICD loader Version 2.2.11
ICD loader Profile OpenCL 2.1
Steps to reproduce
I’m using following example: yolo.txt
Btw It runs fine with CPU backend. Any help regarding is highly appreciated.
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 21 (9 by maintainers)
Looks like, used OpenCL runtime is not functional.
Before playing with DNN, it is better to run regular OpenCV tests. opencv_test_core includes tests for basic OpenCL operation. If something fails in core, so there is bad news.
It is just a check for parameter support (however, it is noisy enough).
Program hangs(runs) somewhere else. Debugger should help to determine problem.
@dkurt, I’m Sorry, But I can’t find any documentation regarding
-cl-no-subgroup-ifp
argument anywhere online besides in this library with only a few occurrences and no description provided.