wgpu: Memory leak on MacOS | M1 OSX
Description Some users of my tutorial have been experiencing memory issues on my buffer tutorial. It seems to only be an issue on the new M1 chips.
Repro steps
You’ll need a mac with and M1 chip, then run the tutorial code here https://github.com/sotrh/learn-wgpu/tree/master/code/beginner/tutorial4-buffer. If you have the repo already downloaded you can just run cargo run --bin tutorial4-buffer
.
Expected vs observed behavior The expected behaviour is no memory leaks on M1
Extra materials I don’t have a Mac, so I can’t provide hardware specifics other that it’s occurring on M1, but the I’ll link the issue from my repo here. https://github.com/sotrh/learn-wgpu/issues/207
Platform OSX with M1 chip, wgpu 0.9
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 54 (25 by maintainers)
I think I’ve figured out what’s going on in #1936. Basically in some newer Big Sur versions and especially on the M1 SoC, Metal’s AnimationCoreKit will shut off some housekeeping tasks like reclaiming certain resources and buffers when the window isn’t in focus. The Apple recommended way to solve this seems to be to stop rendering when
NSWindowOccludedState
is false.At long last, this tests as fixed by https://github.com/gfx-rs/wgpu/pull/4781
My repro is as follows,
tutorial4-buffer
under instruments (needs to be signed with debug entitlement on m1), with leaks and metal profiles added.I also tried on a intel mbp, where I did see a tiny increase in allocations that plateaued, and which drop off to earlier levels after bringing the window back.
The wgpu examples’ framework does some frame limiting which I think is hiding this for them. Making the framework unconditionally call the
request_redraw
without any time checking made it behave similarly with the shadow example in my tests.The allocations are clearly just calls for
nextDrawable
. In the shadow example I also sawQueue::write_buffer
, but I think I’m just seeing all allocations that are happening in a frame.I would draw the conclusion that when the window is occluded,
nextDrawable
isn’t really waiting for anything, and for some reason it ends up allocating new drawables instead of waiting on the previous ones to be reused.In this shadow example trace you can kind of see it
First it is rendering quite normally, then when occluded, it takes a moment and the rendering looks really dense, after which there is a reeeally long pause when I unoccluded the window, and then rendering resumes normally, and looks like memory levels normalise as well!
But I’m not really sure who’s bug this is. The behavior seems unexpected, so I would think it is actually an metal/os bug (it should still be adhering to
maxDrawableCount
?). But if it is by “design” thatnextDrawable
never waits when window is occluded, is it expected that apps behave sanely and not try to keep drawing? I think it could be fixed onwinit
side too, by not issuingRedrawRequested
when NSWindowoccludedState
is not visible? Rate limiting in other ways seems to work, but feels a bit hacky 😕Also, googling this I keep ending up in gfx-rs/gfx#2460 😃
I’ve been playing around with learn-wgpu/tutorial4-buffer.rs a bit, and I have noticed that it uses much less memory when using PresentMode::Immediate and for some reason, with RUST_LOG=info enabled.
fwiw, the Xcode Metal Game template uses
MTLCommandBuffer::addCompletedHandler
along with a semaphore to keep the CPU from getting more than three frames ahead of the GPU. This Apple developer document describes the strategy under “Manage the Rate of CPU and GPU Work”.In WGPU, I see a call to
CommandBufferRef::add_completed_handler
, but that’s only being used to mark command buffers as complete so their resources can be freed as far as I can tell.I’m aware the
CAMetalLayer::nextDrawable
say it’s supposed to wait until a drawable is “available”, but if that were the case, why would the other examples bother with the semaphore?Also, just noticed the window obscuring behavior (e.g., just use the Activity Monitor window to completely obscure the cube window) just uncaps the framerate, but doesn’t cause the memory usage to run out of control.