elastix: OpenCL memory error CL_OUT_OF_RESOURCES

Hello, I’m having a difficulty running registration with OpenCL, getting this error message CL_OUT_OF_RESOURCES.

The moving and fixed images are about 4MB in size. Strangely, the memory usage maxes at about 2GB (the GPU on the machine has 8GB of memory, GTX 1080) and the GPU utilization maxes at 5%. Even if I decrease the number of resolutions to make the pyramid smaller, I still get the same error message:

Description: CL_OUT_OF_RESOURCES WARNING: Unable to configure the GPU. The OpenCLFixedGenericImagePyramid is switching back to CPU mode. Error: in function: opencl_context_notify Details: OpenCL error during context creation or runtime: CL_OUT_OF_RESOURCES error executing CL_COMMAND_WRITE_BUFFER on GeForce GTX 1080 (Device 0).

Any insights on the issue?

About this issue

Original URL
State: open
Created 6 years ago
Comments: 31 (7 by maintainers)

Commits related to this issue

Try fixing OPEN_CL_OUT_OF_RESOURCE Issue #70 — committed to squll1peter/elastix by deleted user 4 years ago
fixing Issue #70 CL_OUT_OF_RESOURCES, Removing trivial comments — committed to squll1peter/elastix by deleted user 4 years ago

Most upvoted comments

Thanks everyone for the comments in this issue, and special thanks to @urlicht for bringing this up and, of course, to @squll1peter for providing the original solution for the OpenCL pyramids! With the help of @N-Dekker and @dpshamonin, we have merged the two recent PRs (#734 & #741) in the main branch that fix the OpenCL pyramids and resampler respectively.

Is it working for everyone now? @urlicht, @chunlc @ZayrX @vzickus @HainBuche @jiangliMED @dennis000-wq @jakob1379 @1989HD

ntatsisk on Oct 12, 2022

Maybe one of the developers could also post all the flags used for ITK and Elastix for the build that worked for them and which programming enviroment etc. they used? Thanks in advance!

HainBuche on Mar 7, 2019

Dear all,

Just as @jakob1379 I am wondering if there are currently any (and if so, which) versions of ITK & Elastix that allow the use of OpenCL Pyramids and Resampler?

I’ve been going back to multiple earlier versions and setting multiple different cmake flags, but up to now I can either not build the solution, or the build is successful but I’m not able to use the pyramids or resampler due to ‘out_of_resources’ of ‘not installed’ issues (even though the cmake flag was definitely ON).

Any help or pointers would be highly appreciated. Thanks in advance!

Hans

1989HD on Feb 19, 2021

I’ve forked a branch here, modified it and recompiled it against a freshly complied and installed ITK v5.0.1 to make sure that changes only applies to elastix source code.

Before go deeper into the problem you encountered, I would like to address one possible solution to AdvancedMattesMutualInformationMetric problem I encountered in my post, cuz I think it might be related. After debugging, the problem is likely caused by another issue during grafting. FixedImageRegion does not overlap the fixed image buffered region is thrown when buffered region of an Image is smaller than requested region(implemented in itkImageToImageMetric.hxx ) . It is thrown during initializing Metrics for second level of resolution(resolution 1, first level is resolution 0) in the pyramid, not in the first level. In a pyramidFilter with n resolution, it will have n _outputImage_s, each corresponds to a resolution level in the pyramid. I printed out the _outputImage_s of OpenCLFixedPyramidFilter (by calling this->m_FixedImagePyramid->GetOutput(n)->PrintSelf() in itkMultiResolutionImageRegistrationMethod2::Initialize() ), and found out that only the first level output image is buffered (i.e. with non-zero BufferedRegion). As mentioned in last post, the output of a composite filter is grafted from the last child filter. In this case, elxOpenCLFixedGenericPyramid grafts its output images from child m_GPUPyramid in here. But the called OpenCLFixedGenericPyramid::GraftOutput() is inherited from ImageSource，and it only grafts the first output image when the objet has multiple output images ( strange, as I expect a function without specifying index would graft all outputs). The same issue exists in the code of elxOpenCLMovingGenericPyramid.hxx also.

I then replace this->GraftOutput( this->m_GPUPyramid->GetOutput() ); with a recursive version that goes through all outputs.

for(i=0 ; i< this->GetNumberOfLevels();i++){
      this->GraftNthOutput(i, this->m_GPUPyramid->GetOutput(i) );
}

in the elxOpenCLMovingGenericPyramid.hxx and elxOpenCLFixedGenericPyramid , and the registration could be completed on my machine without throwing error, with correct output(but it i only tested with very limited data). I’ve added above workarounds to my branch, you can review it as well.

Back to the problem that you encountered, I think one of the following points might be the cause:

You’re using a multiple resolution rigid registration, thus after the first resolution level, the image is not buffered properly, and an error is thrown when trying initiate ImageToImageMetric.
You’ve run out of memory of GPU. It’s odd, but the performance of current working OpenCL accelerated elastix code is not as good as I originally imagined: A. About 30-40% slower than CPU on my machine(though I’m having a 18 core CPU and elastix is very good at multithreading) B. Eats up a lot of GPU RAM. Peak GPU RAM usage is 6.7 GB when performing 4 resolution level BSpline registration between two 646464 isometric CT volumes. I’m not sure if I broke any memory management mechanism in my workaround.
Different OpenCL behavior on different OS/CUDA/driver version.

I’m looking forward to hear good news from you!

Environment: OS: Ubuntu 18.04 Hardware: Intel 9980XE, nvidia 2080Ti with nvidia driver version 440, ocl-icd-opencl-dev 2.2.11 CUDA-10.2 itk: v5.0.1 default build options and install elastix:Latest develop branch as of 2020/4/21, built with USE_OPENCL, OPENCL_C_VERSION_1_2, OPENCL_USE_NVIDIA_SDK and USE_ALL_COMPONENTS ON

squll1peter on Apr 24, 2020

I also have this error (Nvidia GTX 1070) and I dug around a bit more in the code. I think the problem isn’t actually due to writing buffer to the GPU. I put a clFinish command (which causes the OpenCL kernel to execute all queued commands) right before and after each instance of writing to the buffer with clEnqueueWriteBuffer. The clFinish commands right after writing to the buffer execute successfully, but one of the clFinish commands right before a clEnqueueWriteBuffer command crashes with a different error message: CL_INVALID_COMMAND_QUEUE. It appears likely that the crash is caused by a bug in the OpenCL kernel itself rather than any commands sent to it, though I have not confirmed this yet.

goldenratio1618 on Sep 6, 2019