SDL: Windows: main thread is blocked when user resizes or moves a window

This bug report was migrated from our old Bugzilla tracker.

These attachments are available in the static archive:

Reported in version: HG 2.1 Reported for operating system, platform: Windows (All), All

Comments on the original bug report:

On 2013-08-30 01:00:19 +0000, wrote:

When user clicks on window title or border, system generates WM_NCLBUTTONDOWN message. When DispatchMessage receives this message, it handles window resizing or moving and doesn’t return until user releases mouse. It also sends WM_WINDOWPOSCHANGING to window proc. When using WinAPI directly there is no big problem that DispatchMessage blocks, because it is possible to handle WM_WINDOWPOSCHANGING message (or use WM_TIMER) to do actions that must be performed regularly. But on SDL it seems to be impossible to do anything in main thread while user moves or resizes window.

On 2014-01-20 05:04:25 +0000, Nathaniel Fries wrote:

I actually spent my weekend fixing this.

I’m not sure when I’ll have time again to work on something, but I did upload my code for it, so someone should be able to whip up a patch fairly easily.

For an SDL-specific patch, I wouldn’t bother with using a thread-local (SDL doesn’t enable the creation of new GUI threads), sending WM_SIZING, or with MINMAXINFO at all.

It even includes (untested) code for child windows, so it should hopefully work in cases where SDL is used from a widget provided by Qt or some other toolkit.

it’s this sourceforge project here: https://sourceforge.net/projects/win32loopl/

On 2014-01-20 16:48:16 +0000, Nathaniel Fries wrote:

*** Bug 2316 has been marked as a duplicate of this bug. ***

On 2014-01-20 16:54:40 +0000, Nathaniel Fries wrote:

Just a heads up, the above fix for this probably shouldn’t be default behavior because it can cause resizing and moving to become choppy to the user if rendering or other main loop code takes too long. It could cause new bug reports from developers of pre-existing SDL2 applications who are simply passing on bug reports from users who updated their SDL2 dll. I’d recommend it as a feature that can be turned on or off by the programmer, and defaults to off.

On 2014-02-06 05:21:48 +0000, Nathaniel Fries wrote:

Created attachment 1549 patch

Finally found time to get around to making a proper patch. Code is mostly the same as I wrote before, but adapted for what SDL looks like internally. Doesn’t make modeless behavior optional, though.

On 2014-02-09 10:08:44 +0000, Sam Lantinga wrote:

Thanks! We’ll take a look at this after the 2.0.2 release. This also potentially fixes issues with dragging the titlebar when the cursor is grabbed?

On 2014-02-09 12:43:34 +0000, Nathaniel Fries wrote:

“This also potentially fixes issues with dragging the titlebar when the cursor is grabbed?”

Not sure what you mean by this. This code has to acquire mouse focus in order to receive all necessary mouse movements.

On 2014-02-09 20:36:01 +0000, Sam Lantinga wrote:

Yes, but we’re in control of the movement process so we can account for our own grab state. It’s not a fix, it just makes it possible to fix. 😃

On 2014-02-09 23:31:57 +0000, Nathaniel Fries wrote:

I suppose that if mouse focus is lost, we should take it back. Might be a possible bug in this code. MSDN says not to call SetCapture when processing WM_CAPTURECHANGE, so I can see how this could have been a difficult issue previously. Now, of course, we can add a simple fix in SDL_PumpEvents or elsewhere, but we’d still have a chance of losing some events that way. A better fix would simply be to use the cursor pos attached to a windows message (__tagMSG::pt), compare it to the capture position, and work from there instead of WM_MOUSEMOVE. Then we won’t even need to worry about mouse capture. Just a thought and haven’t had a chance to test it, though.

Also, there was a (quite obvious once I noticed it) bug in my last patch. This is what I get for not testing thoroughly. When handling WM_MOUSEMOVE, I use lParam instead of the result from GetMessagePos. lParam is relative to the client area, which means it can be negative; GetMessagePos is in screen coordinates. This is what you get for not taking your time. 😃 here’s a hand-written patch for my patch to correct this:

             if(data->in_modeless_resize)
             {
                 POINT ptPos;
+                DWORD dwPos = GetMessagePos();
-                ptPos.x = GET_X_LPARAM(lParam);
-                ptPos.y = GET_Y_LPARAM(lParam);
+                ptPos.x = GET_X_LPARAM(dwPos);
+                ptPos.y = GET_Y_LPARAM(dwPos);
                 WIN_DoResize(hwnd, data, ptPos, SDL_FALSE);
             }

On 2014-02-20 21:05:23 +0000, Nathaniel Fries wrote:

Created attachment 1568 better patch

Attached is a much better patch. I wasn’t sure whether the values returned by SDL_GetWindow[Min/Max]imumSize were client size or window size, so that may need to be corrected inside WIN_DoResize.

Still doesn’t add a new window flag for modeless behavior.

When mouse capture is lost, the modeless resize/movement operation is finalized. This is because:

  1. Attempting to reclaim mouse capture in handling WM_CAPTURECHANGED actually crashes the program. Without mouse capture, we won’t get messages for mouse movement outside the current boundaries of the window.
  2. MSDN states that only the Foreground Window can capture the mouse, and presumably there’s a good reason for an app to claim Foreground Window status (either in response to user input, or an application had to alert the user of something). The user will probably interact with this foreground window, even if just to remove its foreground window status, so it seems silly for SDL to continue acting as if the user is interactively resizing a Window.

On 2014-02-25 12:53:59 +0000, Sam Lantinga wrote:

Reviewing the code, it looks pretty good. I’m looking forward to trying it out after 2.0.2 is released.

The values returned by SDL_GetWindow[Min/Max]imumSize are client size.

On 2014-03-05 11:12:47 +0000, Andreas Ertelt wrote:

There’s one tiny issue I experience with this code - in multi-monitor setups the window always jumps back to the primary monitor when being picked up.

Also, since this patch was submitted, handling for WM_NCLBUTTONDOWN was added - the current code would have to be added to the default-case of this patch and the return statement should be removed.

On 2014-03-06 10:06:56 +0000, Andreas Ertelt wrote:

Another minor issue is that I am receiving SDL_MOUSEMOTION events when moving or resizing the window in a way that the mouse cursor temporarily hovers the window. I also receive two of those events when just clicking and holding the window on the title or border as well as another when releasing.

On 2014-03-09 01:08:36 +0000, Nathaniel Fries wrote:

“There’s one tiny issue I experience with this code - in multi-monitor setups the window always jumps back to the primary monitor when being picked up.” something, isn’t it? I don’t know what would cause this and I don’t have a multi-monitor setup to test on, so I’m afraid I’ll have to leave that fix to someone else.

I’ve identified the likely cause of that minor issue (double SDL_MOUSEMOTION events). When WIN_DoResize is called in response to WM_MOUSEMOVE, the last argument should be SDL_FALSE instead of SDL_TRUE (SDL_TRUE indicates that it should “force” the cursor position to a correct value after resizing).

On 2014-03-11 07:11:10 +0000, Andreas Ertelt wrote:

Didn’t have much time to test this, but the change you suggested caused the application to crash (quite literally).

The multi monitor issue is related to the GetSystemMetrics() call which is fed with SM_CXSCREEN/SM_CYSCREEN which limits the routine to the primary monitor. Instead you would have to use MonitorFromRect() to find the current/nearest monitor (using mouse coordinates and MONITOR_DEFAULTTONEAREST) and then retreive its size/coordinates using GetMonitorInfo(). Alternatively using SM_CXVIRTUALSCREEN/SM_CYVIRTUALSCREEN would be a quick fix, but that would make that part a bit pointless for setups where monitors don’t use the same resolutions and/or aren’t properly aligned.

Another sideeffect I found is that the aero features snap and shake stop working. Not quite sure how to emulate those correctly (especially since snap delivers visual feedback as well).

[HKEY_CURRENT_USER\Software\Policies\Microsoft\Windows\Explorer] “NoWindowMinimizingShortcuts” defines the state of shake.

[HKEY_CURRENT_USER\Control Panel\Desktop] “WindowArrangementActive” defines the state of snap.

On 2014-03-12 19:09:40 +0000, Nathaniel Fries wrote:

Actually MSDN makes it seem like the default maximum window tracking dimension is GetSystemMetrics(SM_C[X/Y]MAXTRACK) regardless of the monitor the window is on.

http://msdn.microsoft.com/en-us/library/windows/desktop/ms724385(v=vs.85).aspx

I never knew of the shake and snap features. I’ve played around with them briefly, and shake appears to be relatively simple to implement using documented User32 calls (however, because I’m using documented User32 calls to change window position, they require a full redraw which appears to take longer than whatever User32 does internally - this may make the shake feature appear laggy as well as require we take some liberty with the timing). Snap would require I somehow cover the entire screen with a blue highlight, and I’m not sure how to do that without creating a window the size of the desktop (and we return to mouse focus issues).

On 2014-03-12 20:40:33 +0000, Nathaniel Fries wrote:

Actually, by playing around I think I’ve found a relatively simple way to highlight the entire screen, but it will require different code for windows on extra monitors so I can only guarantee something that would work on single-monitor systems. Might be awhile before I can materialize a complete fix, though.

On 2014-03-16 09:42:03 +0000, Nathaniel Fries wrote:

Believe it or not, I’m finding it harder to get shake just right than snap. I have basically functional versions of both in my little project on sourceforge now.

I won’t be making another patch for SDL until I’ve got all the little quirks worked out though. Might be some time.

On 2019-12-07 17:00:27 +0000, Jake Del Mastro wrote:

Has there been any progress on this bug? I’m noticing this still seems to be an issue in SDL 2.0.10

On 2020-03-24 21:13:36 +0000, Ryan C. Gordon wrote:

(In reply to Jake Del Mastro from comment # 18)

Has there been any progress on this bug? I’m noticing this still seems to be an issue in SDL 2.0.10

Reading through all these comments, is this something we really want? It sounds like something that we’re going to have to maintain every time Microsoft adds/changes a UI mechanic, and never get quite right, and introduce a bunch of risky behaviors, just to be more responsive when someone drags the window.

I’d be inclined to mark this WONTFIX, but I’ll let Sam make that decision if he wants.

–ryan.

On 2020-04-16 16:50:42 +0000, Ron Aaron wrote:

It’s not just an issue on Windows. macOS has the same problem (don’t know if it’s for a similar reason)

On 2020-04-16 19:19:48 +0000, Andreas Ertelt wrote:

Ryan is correct, the way this patch approaches the issue would require changes over time to stay consistent with Windows behavior and there are too many corner cases to consider.

But a problem should definitely not be marked WONTFIX just because a suggested solution is inadequate.

While I’m fairly sure there is no feasible solution that fully fixes the issue as it was reported here, the likely prime issue most people are concerned with is not being able to perform drawing operations / simulation anymore.

This could be addressed on Windows by allowing developers to register a callback (per window) to be performed on its WM_SIZING(!), WM_PAINT and likely also WM_ERASEBACKGROUND events. If this feature is used, the message loop would also have to call InvalidateRect on the window whenever no more messages are in the queue and upon completion of the callback a ValidateRect on the window would have to be issued (this is to make sure WM_PAINT events keep getting issued when nothing else is happening).

I’m confident most other platforms could be handled in a similar fashion.

This approach wouldn’t affect existing programs in any way and provide developers who care about not being interrupted for an unreasonable amount of time with the means to address the issue with minimal changes and without having to hijack the window’s message handler.

On 2020-04-18 13:08:58 +0000, Andreas Ertelt wrote:

I just checked my engine code and there are three more corner cases to be considered on Windows that I didn’t think of anymore.

One is system/context menus, the other when a modular window is opened (eg. message box) and the last is picking the window up without moving it (can also be the case when moving isn’t configured to redraw the window in Window’s performance options).

I worked around all of this by starting a timer on the window that triggers the redraws. This timer is started under the following conditions:

  1. When WM_SYSCOMMAND is called with the (wparam & 0xfff0) == SC_MOVE (this also happens when the the regular window menu is opened).
  2. it must also be started when WM_ENTERMENULOOP is received to stop context menus from interrupting the program.
  3. The WM_ENABLE message is received with a wparam of 0.

The only slight annoyance I could notice at this point is when you hold down the caption bar with the mouse, it takes a second to actually call the first timer-event. This can be slightly alleviated by allowing WM_GETICON to trigger a draw while the timer is active. The WM_GETICON-behavior has likely been introduced with Vista - I currently have no older machine to verify this on.

This redraw timer can then be deleted on the next proper WM_PAINT message received while the window is active again (WM_ENABLE).

In my program I trigger this timer at the refresh rate and make sure there is no more message like it in the queue before issuing the draw call (to avoid clogging the message queue).

I can’t think of an alternative to using a timer here, being that the control over the message loop is being temporarily diverted and the only event being reliably triggered being WM_GETICOn at a 1Hz frequency. At least I couldn’t find any other way to introduce events under these conditions.

On 2020-07-12 09:53:51 +0000, Jack C wrote:

Any updates to this bug? I like Andreas Ertelt’s idea of introducing optional callbacks for those events. I know Blender’s approach to drawing while resizing the window is handled in WM_SIZE/WM_SIZING event. There is an event dispatch call under “case WM_SIZE:” that will lead to a draw call.

You can find the code I am referring to here.

https://github.com/blender/blender/blob/404486e66c6a4ebebb085700d58b396597146add/intern/ghost/intern/GHOST_SystemWin32.cpp#L1659

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 4
  • Comments: 63 (26 by maintainers)

Commits related to this issue

Most upvoted comments

@icculus Unplugging your network cable as the host of a multiplayer game would indeed disconnect everyone else playing the game (If you’re not the host, as with a client/server model, it would at minimum disconnect yourself). But that’s completely expected by the user who unplugged their network cable, both the client/server and the p2p results of that action are well-understood by the user and by the game dev, and as game devs we can add a message like “The host has disconnected” which the users would be able to figure out in a crystal clear manner that because Robert unplugged his network cable, and Robert was (presumably) the host in the p2p game, everyone got disconnected. E.g., No bug tickets for us the dev team, because the users fully understood exactly what happened.

Having everyone (or even just yourself) disconnect just because you dragged a window is very subtle and frustrating, and it would be difficult for the user to even realize that it was the dragging that caused the issue, as opposed to just thinking your application is sucky. It took me as the dev countless hours of debugging to realize that the reason why my application client was disconnecting from the server every once and a while was because I was dragging the window, dragging the window just isn’t something that I interpret as an action that could affect my application, it’s just a subconscious thing I do to ensure that things are placed well. Additionally, for me, dragging the window only disconnected client from server like 1/4th of the time which makes it even harder to make that association, it just looked like a completely random bug that we couldn’t figure out how to reproduce consistently for the longest time, but made the application somewhat annoying to use for long periods of time, and its not like our users ever reported that they were dragging the window when it happened, they had no idea how to replicate it either, it just happened randomly from their point of view. Once we figured out the association it wasn’t hard to find this github issue, but something better can be done here.

vvvvvvvvvvvvvvvvvvvvvvv

Imo, at the absolute minimum, the documentation of SDL_PollEvent desperately needs to say that it will block if the user drags or resizes the window on the Windows OS. Then at least developers can work around the issue and maintain network connections on another thread without it being an unnecessarily large refactor after the fact [as it was for us].

^^^^^^^^^^^^^^^^^^^^

Making the executive decision to close this bug as wontfix; this isn’t worth all the known problems and unknown risks that fixing it would cause.

I’ve added a solution that dovetails nicely with the new main callbacks in SDL 3.0 and if you’re not using that you can set an event watcher to handle expose events and draw then.

Thanks for all the feedback!

one person decides to drag or resize their window, and the whole session dies, pissing off all of the players trying to play.

What happens to the other players when someone unplugs their network cable in this scenario?

There needs to be some kind of SDL hint, or something along those lines to fix this behavior, because this is making SDL2 borderline unusable for games that have lockstep netcode. (one person decides to drag or resize their window, and the whole session dies, pissing off all of the players trying to play. YAY!)

One shouldn’t need to rely on a patch that is old and insanely hard to find (I’ve been searching for such a thing for weeks, only found this now.) just to get past such an obvious and horrid issue. Not to mention said patch likely can’t even be merged with current SDL2 anymore due to its age.

Been struggling with this problem for years over multiple projects, and I’m tired, frustrated, and fucking desperate for something, anything, that can remedy it.

Any updates on this?

Here’s what I suggest : https://gist.github.com/RT222/804bda0bb1ed305e6351dc3a9a07869b

That’s how I fix this issue in my engine, and it’s the most elegant way I could find. It’s not hard to implement, doesn’t require multithreading and is easy to use.

It probably wouldn’t be too hard to add it to the SDL. What do you think about it?

This is a good approach!

In case it helps someone, we seem to have been able to work around this issue using SDL_SetEventFilter (https://github.com/ppy/osu-framework/pull/3996/commits/c938e6c9094cfec5d1dd30cbd27b9735e311d363). It’d be great if a solution can be reached to fix this in a sane way.

I like void SDL_SetModalLoopCallback(ModalLoopCallback callback, void *userdata). Note I omitted the second calback though, and I think there needs to be a userdata parameter:

I think it is best to start with a design that does not assume it’s about MS Windows’ window resizing and redrawing (since as mentioned above on macOS there is a similar but different situation with app menus which should ideally use the same callback), and then if needed do a separate mechanism for like, the resize callback or whatever is then specific to MS Windows’ redraw. Something like SDL_SetModalMSWindowResizeCallback(...), maybe. This keeps the base mechanism simpler and more universal for those who just want their netcode and other vital logic updates to not die, and those who really want the full redraw magic could use the additional special functions for the specific use cases to handle viewport resizing, etc.

Here is the fiber trick implemented internally in SDL: https://github.com/libsdl-org/SDL/compare/main...hstormo:event_fiber With this patch your own code doesn’t need to change at all. It works well in every example I tried it on.

It doesn’t quite work right if you use SDL_WaitEvent or SDL_WaitEventTimeout, but you can use WaitMessage or MsgWaitForMultipleObjects instead to get around the problem. Maybe someone who knows that codepath better than me can get it right.

Perhaps @slouken can comment on whether this is worth opening a pull request for. I don’t know if using fibers falls under the “risky behaviors” that have been mentioned before. One gotcha is that the event pump must only be called from the same thread that initiated the video device, since that thread runs the fibers – but that is already a documented requirement.

@icculus I don’t understand. At face value your comment just seems irrelevant to me. Any networkied action game will fundamentally drop out of the session if the entire PC hangs… so, huh?

I am really surprised I even need to go into this, since SDL2 seems to encourage a less-threads-is-better design in general, so why is my request apparently so weird? How in particular is it strange to want to not make the game misbehave and drop out just when I resize the window?

Yes, disk I/O should be loading screen only, or in threads. (Or non-blocking I/O! Threads are not always the only answer.) And yes, you can thread game logic and netcode, too. Should you? Should you just to make resizing not break everything massively? How is this scenario so contentious all of a sudden? I’m legit stumped.

So to get back to the issue, would it be possible to add a “let me do non-UI things on the main thread while the OS blocks the window” to SDL2? I find it really hard to believe it’s just me finding that useful, even if I just scroll to previous comments. I don’t understand this discussion. I don’t understand either why “you HAVE to use threads” is an acceptable answer.

I like your changes. Just as a note: void *userdata = NULL in the parameter list is afaik not valid C99, but that’s a minor nitpick. Now if all of this could be accessed by just using SDL_SetModalLoopCallback/SDL_SetModalLoopResizeCallback and providing the Resize/Draw callbacks and userdata and SDL2 does everything else in that gist, that’d be amazing in my book. I hope something like this can be added, it looks really good to me. (Disclaimer: haven’t test-run the code yet.)

ideally not change how people write SDL applications.

I feel like it was agreed upon in previous comments that was likely an impossible goal due to SDL2’s design of letting the app own the main loop.

The callback I am suggesting (maybe SDL_SetProcessingCallbackForWindowOpBlock?) would be resigning to that reality, and enable people with more single threaded apps to change their code to keep things running by continuing non-UI updates like netcode to avoid disastrous effects of resizing blocks, outside of the IMHO minor no-redraw issue. Compared to the other suggestions like “use threads,” I think this approach would allow most affected SDL2 apps (those that malfunction if stopped for too long) to be adapted with really minor changes.

I suggest for this callback:

  1. it should be guaranteed to happen on the main thread only,
  2. it shouldn’t fire willy-nilly when nothing is really blocked (to not mess with main loop timing unnecessarily, and optionally also serve as an indicator for the app things are currently blocked)
  3. it fires at some reasonable frequency between 5ms and 20ms to keep faster-paced networking alive while not spamming in a near busy loop,
  4. it should have a wiki page that clearly says what is allowed in the callback (e.g. I assume touching any SDL2 UI or event processing functions would be strictly forbidden, and touching any not non-blocking I/O heavily discouraged), but it’s less obvious for other things like maybe keys pressed state,
  5. it should cover at least some of the common blocking cases as a start, like window resizing on MS Windows.