glfw: Resizing on M1 has race crashes when main and draw are different threads

Originally I thought this was a duplicate of #1682, but now I see that it is not. macOS on M1 windows will crash randomly when resizing if the rendering is not on the main thread. This model works great on all other OSes and also on macOS Intel.

You can see the Fyne issue that relates to this fyne-io/fyne#2188.

Reproduction:

Run just about any GLFW app with a separate draw thread, but for Fyne’s use-case you can do the following:

  • go run fyne.io/fyne/v2/cmd/hello@v2.1.1
  • Resize the window (maybe a lot) and see one of a couple of possible crashes

The crashes vary but are usually either in gl clear command, or a segfault in the paint code. We have a workaround in place for v2.1.2, by reverting to a single thread (main+draw) which works around this, but is less pretty on resize…

About this issue

  • Original URL
  • State: open
  • Created 3 years ago
  • Reactions: 7
  • Comments: 27 (14 by maintainers)

Commits related to this issue

Most upvoted comments

I did some digging and it looks like here (in the “OpenGL Programming Guide for Mac”) it explains that: “When you use an NSOpenGLView object with OpenGL calls that are issued from a thread other than the main one, you must set up mutex locking.”

I think the core of the issue is this call here: https://github.com/glfw/glfw/blob/dd8a678a66f1967372e5a5e3deac41ebf65ee127/src/cocoa_window.m#L234

This calls NSOpenGLView update and the Apple Docs explicitly state: “A multithreaded application must synchronize all threads that access the same drawable object and call update for each thread’s context serially.”

I also found this project experiencing similar issues in 2019: https://github.com/flutter/flutter/issues/30671 And they determined “The desktop shells resize the graphics context on the main thread, while drawing to it happens from a background thread.” This matches the behavior above.

Given the prev issue was in 2019 and matches the behavior here, I don’t think this issue is unique to M1. Also, the documentation I’ve linked is from 2015 or earlier, so this requirements has been known for some time. Its possible that the M-series chips are more prone due to their speed or architecture to triggering problems more reliably.

Further, a lot of my crashes (depending which thread crashes first) are sourced from this update call I pointed out above. I think its the culprit.

I’m not sure what the proper approach glfw can take is to solve this issue, but it appears we need a lock around that call as well as any OpenGL context usage on other threads.

Thanks @yairchu, now finally found some more details about OpenGL locking required on non-main thread: https://developer.apple.com/library/archive/documentation/GraphicsImaging/Conceptual/OpenGL-MacProgGuide/opengl_threading/opengl_threading.html.

The following code change solved my problems:

    void render() override {
        //lock rendering context
        CGLContextObj cglContext = CGLGetCurrentContext();

        CGLLockContext(cglContext);

        //render
        AminoGfx::render();

        //unlock again
        CGLUnlockContext (cglContext);
    }

From the Apple developer pages

When you use an NSOpenGLView object with OpenGL calls that are issued from a thread other than the main one, you must set up mutex locking. Mutex locking is necessary because unless you override the default behavior, the main thread may need to communicate with the view for such things as resizing.

Applications that use Objective-C with multithreading can lock contexts using the functions CGLLockContext and CGLUnlockContext. If you want to perform rendering in a thread other than the main one, you can lock the context that you want to access and safely execute OpenGL commands. The locking calls must be placed around all of your OpenGL calls in all threads.

CGLLockContext blocks the thread it is on until all other threads have unlocked the same context using the function CGLUnlockContext. You can use CGLLockContext recursively. Context-specific CGL calls by themselves do not require locking, but you can guarantee serial processing for a group of calls by surrounding them with CGLLockContext and CGLUnlockContext. Keep in mind that calls from the OpenGL API (the API provided by the Khronos OpenGL Working Group) require locking.

That is perfect thanks. For some reason I was lost in possible example apps, not the test code 😃