opencv: `OpenCL error CL_INVALID_CONTEXT` when `~UMat()` and default OpenCL context has changed

When a UMat is created in an opencv opencl context (e.g. Intel gpu), then the opencv opencl context is changed to another device (e.g. NVidia gpu), and then the UMat is destructed… it leads to errors of CL_INVALID_CONTEXT.

I think this is due to OpenCLAllocator::unmap() wrongly using the command queue of the current “default” opencv opencl context for the unmap. It should instead use the command queue for the opencl context that OpenCLAllocator::map() itself.

This issue probably also exists in OpenCLAllocator::deallocate_()

I would appreciate some feedback on this from the core OpenCV team. I’ll investigate myself in parallel.

System information (version)

  • OpenCV 4.5.3
  • Microsoft Windows [Version 10.0.19043.1110]
  • VS2019 v16.10.4
  • Laptop with Intel integrated GPU, and NVidia discrete GPU

Reproduction

  1. Create opencv app that uses opencl and enables the user to change opencl contexts via cv::ocl::Context::create(config), cv::ocl::OpenCLExecutionContext::create, and exec_context.bind()
  2. Run app
  3. Set to Intel GPU
  4. Create UMats
  5. Change context to NVidia GPU, but don’t proactively ~UMat(). Instead let some UMats exists from both contexts.
  6. Now destroy the older Intel gpu UMats.

Result

For each old intel UMat that is destructed…

OpenCV(4.5.3) Error: Unknown error code -220 (OpenCL error CL_INVALID_CONTEXT (-34) during call: clEnqueueUnmapMemObject(handle=00000000FDD6BE30, data=00000000E84A0000, [sz=99532800])) in cv::ocl::OpenCLAllocator::unmap, file ..\modules\core\src\ocl.cpp, line 5926
Exception thrown at 0x00007FFDE3EA4ED9 in Max.exe: Microsoft C++ exception: cv::Exception at memory location 0x00000000007DECF0.
exception in outputmatrix() OpenCV(4.5.3) ..\modules\core\src\ocl.cpp:5926: error: (-220:Unknown error code -220) OpenCL error CL_INVALID_CONTEXT (-34) during call: clEnqueueUnmapMemObject(handle=00000000FDD6BE30, data=00000000E84A0000, [sz=99532800]) in function 'cv::ocl::OpenCLAllocator::unmap'

Do the reverse (old nvidia and new intel), then same function fails but different line because of a different map/copy approach

OpenCV(4.5.3) Error: Unknown error code -220 (OpenCL error CL_INVALID_MEM_OBJECT (-38) during call: clEnqueueWriteBuffer(q, handle=000000006EFD34C0, CL_TRUE, 0, sz=99532800, data=000000011071B080, 0, 0, 0)) in cv::ocl::OpenCLAllocator::unmap, file ..\modules\core\src\ocl.cpp, line 5947
Exception thrown at 0x00007FFDE3EA4ED9 in Max.exe: Microsoft C++ exception: cv::Exception at memory location 0x00000000007DECF0.
exception in outputmatrix() OpenCV(4.5.3) ..\modules\core\src\ocl.cpp:5947: error: (-220:Unknown error code -220) OpenCL error CL_INVALID_MEM_OBJECT (-38) during call: clEnqueueWriteBuffer(q, handle=000000006EFD34C0, CL_TRUE, 0, sz=99532800, data=000000011071B080, 0, 0, 0) in function 'cv::ocl::OpenCLAllocator::unmap'

Expected

No errors. And the old Intel UMats have their resources released.

Notes

In unmap(), all three methods (SVM, map, copy) use a command queue gotten via

https://github.com/opencv/opencv/blob/d52e4e5df3de959bbb5cdd189be45a2a364467cb/modules/core/src/ocl.cpp#L5891

This is an errant approach. When releasing opencl resources for a UMat, it needs to use the opencl context that allocated those resources for that specific Umat. Otherwise, it is unsafe and will lead to errors like above when contexts are changed.

That means the UMatData needs to know its own opencl context or command queue. Does this knowledge already exist on the UMatData? Is it available at UMatData::allocatorContext, prevAllocator, or currAllocator? If the knowledge is already on UMatData (or can be easily added), then this function’s code can be updated to use that command queue.

OpenCLAllocator::deallocate_() I think has a similar errant approach but its code is different. For example

  • SVM approach it uses Context::getDefault() and Queue::getDefault().ptr(). I think it needs to use the context/queues from original allocation context.
  • non-SVM uses cl_command_queue q = (cl_command_queue)Queue::getDefault().ptr(). I think it needs to use the context/queues from original allocation context.
  • etc.

Related issues

There are a number of related issues about “switching opencl devices” but none are so exact as to isolate this single line of code. If we can address this, then it might contribute to those other issues. I’m subscribed to several of them myself.

Resoving this issue will resolve issue #18919 Related to issue #9593, issue #6926, issue #8035

Issue submission checklist

  • I report the issue, it’s not a question
  • I checked the problem with documentation, FAQ, open issues, forum.opencv.org, Stack Overflow, etc and have not found solution
  • I updated to latest OpenCV version and the issue is still there

About this issue

  • Original URL
  • State: open
  • Created 3 years ago
  • Reactions: 4
  • Comments: 18 (18 by maintainers)

Most upvoted comments

I’m unsubscribing as I’ve fixed this years ago in my fork

😝 Agreed. I would have to manually destruct anything that needs the cl_context handle in Context::~Impl because the code there manually destructs the raw handle pointer. Rather than writing yet more raw pointer management code, I made a commit that stores the cl_context handle as a std::unique_ptr with a custom deleter clReleaseContext(). C++ RAII. I have confirmed that the handle is listed above everything that could need it (like the BufferPools) that need it. Therefore, C++ will destruct them in reverse order listed in code.

I’ve also found many thread unsafe functions that are called once for initialization. These are the same style bug that found in the core fastMalloc() function that I fixed. I’ve made all the changes in a single commit – the involved code is simplier and easier to read.

I’m aware the CI is failing tests on this PR. My local tests with my app works well, but those DNN tests use things differently. I’ll get to those bugs this week.

While all the changes are within ocl.cpp, they are throughout that code. It could be helpful to review the changes in units of commits, rather than all of it all at once. I’ll continue to push separate commits. And after it all works and approved, I’ll forcepush a squash.

When today’s PR push, behavior of the global Context list and BufferPools has changed. I don’t see these as a bug. Instead, this is new behavior and I’m now aware OpenCV has defined any rule on this topic. I’m open to discussion on it. And I don’t have a strong preference at this time.

OpenCL(SVM)BufferPoolImpl’s now know the Context for which they are managing buffers. This faciliates correct allocation/deallocation of a buffer (for a UMat) to the cl_context for that UMat. There would be leaks or errors otherwise.

There is a sideaffect. The behavior of 3 Aug (above post) having a big list with nulls and only two useful entries has changed. Now I am seeing only size()=2. Why? Because when all the UMats have released their ownership on the Context, there is still 2-3 owners remaining…the (default, hostptr, svm)BufferPoolImpl’s. There isn’t anything that proactively tells those Bufferpools to delete/deallocate/release. Instead they are usually released when a Context is released/deallocated. But since the Bufferpools hold ownership of their parent (Context), no one initiates the chain of release.

I am imagine some new behavior rules

  1. Keep this new behavior. When an ExecContext is made, and used, its Context and BufferPoolImpl’s remain. It is possible to release their pool entries by the SDK user calling cv::ocl::getOpenCLAllocator()->getBufferPoolController()->freeAllReservedBuffers() before a new bind()…because those only work on the current bind() ExecContext. Naturally, OpenCL runtime choses when its actually processes those commands and completes the frees. However, that just frees the entries. The BufferPoolImpls and its parent Context would remain.
  2. I can change the new code to get a non-owning pointer to the Context::Impl. This should do all the good work of correct allocating/freeing by the BufferPools. While adding back the behavior of yesterday…that when all UMats are free’d, it results in the Context and therefore BufferPools all being freed.
  3. Add code to OpenCLExecutionContext::bind() that does a release of the BufferPoolImpl’s, or something like that. This is interesting but might be difficult because code (like my own app) does a new bind() but still has UMat::getMat() host-side data being used. Usually in a few minutes those UMats are released by my code and at that time it could be possible to them release the BufferPools and/or Context. But…what if I want to use that same ExecContext again? In my test case I’m running now, I’m switching ever 20 seconds. In that test case, it would be smarter to keep the BufferPools so they can be reused…and (1) does that.
  4. Make a new API that simplifies the ability of an SDK user to completely free resources associated with a Context. It would throw an exception or fail if it is unable to completely release due to resources being used (e.g. UMats still holding ownership of Context, or BufferPools still having entries being used, etc.).

Comments?