wgpu: Memory leak on MacOS | M1 OSX

Description Some users of my tutorial have been experiencing memory issues on my buffer tutorial. It seems to only be an issue on the new M1 chips.

Repro steps You’ll need a mac with and M1 chip, then run the tutorial code here https://github.com/sotrh/learn-wgpu/tree/master/code/beginner/tutorial4-buffer. If you have the repo already downloaded you can just run cargo run --bin tutorial4-buffer.

Expected vs observed behavior The expected behaviour is no memory leaks on M1

Extra materials I don’t have a Mac, so I can’t provide hardware specifics other that it’s occurring on M1, but the I’ll link the issue from my repo here. https://github.com/sotrh/learn-wgpu/issues/207

Platform OSX with M1 chip, wgpu 0.9

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 54 (25 by maintainers)

Most upvoted comments

I think I’ve figured out what’s going on in #1936. Basically in some newer Big Sur versions and especially on the M1 SoC, Metal’s AnimationCoreKit will shut off some housekeeping tasks like reclaiming certain resources and buffers when the window isn’t in focus. The Apple recommended way to solve this seems to be to stop rendering when NSWindowOccludedState is false.

At long last, this tests as fixed by https://github.com/gfx-rs/wgpu/pull/4781

My repro is as follows,

  1. Setup the tutorial4-buffer under instruments (needs to be signed with debug entitlement on m1), with leaks and metal profiles added.
  2. Run, and after a while, make the window occluded (behind the instruments window). Bonus for making it visible again and trying to f.ex. resize, and notice that it is hanging for quite a while until it continues.

I also tried on a intel mbp, where I did see a tiny increase in allocations that plateaued, and which drop off to earlier levels after bringing the window back.

The wgpu examples’ framework does some frame limiting which I think is hiding this for them. Making the framework unconditionally call the request_redraw without any time checking made it behave similarly with the shadow example in my tests.

Screenshot 2021-08-13 at 23 29 17

The allocations are clearly just calls for nextDrawable. In the shadow example I also saw Queue::write_buffer, but I think I’m just seeing all allocations that are happening in a frame.

I would draw the conclusion that when the window is occluded, nextDrawable isn’t really waiting for anything, and for some reason it ends up allocating new drawables instead of waiting on the previous ones to be reused.

In this shadow example trace you can kind of see it Screenshot 2021-08-14 at 0 28 35 Screenshot 2021-08-14 at 0 28 50 First it is rendering quite normally, then when occluded, it takes a moment and the rendering looks really dense, after which there is a reeeally long pause when I unoccluded the window, and then rendering resumes normally, and looks like memory levels normalise as well!

But I’m not really sure who’s bug this is. The behavior seems unexpected, so I would think it is actually an metal/os bug (it should still be adhering to maxDrawableCount?). But if it is by “design” that nextDrawable never waits when window is occluded, is it expected that apps behave sanely and not try to keep drawing? I think it could be fixed on winit side too, by not issuing RedrawRequested when NSWindow occludedState is not visible? Rate limiting in other ways seems to work, but feels a bit hacky 😕

Also, googling this I keep ending up in gfx-rs/gfx#2460 😃

I’ve been playing around with learn-wgpu/tutorial4-buffer.rs a bit, and I have noticed that it uses much less memory when using PresentMode::Immediate and for some reason, with RUST_LOG=info enabled.

fwiw, the Xcode Metal Game template uses MTLCommandBuffer::addCompletedHandler along with a semaphore to keep the CPU from getting more than three frames ahead of the GPU. This Apple developer document describes the strategy under “Manage the Rate of CPU and GPU Work”.

In WGPU, I see a call to CommandBufferRef::add_completed_handler, but that’s only being used to mark command buffers as complete so their resources can be freed as far as I can tell.

I’m aware the CAMetalLayer::nextDrawable say it’s supposed to wait until a drawable is “available”, but if that were the case, why would the other examples bother with the semaphore?

Also, just noticed the window obscuring behavior (e.g., just use the Activity Monitor window to completely obscure the cube window) just uncaps the framerate, but doesn’t cause the memory usage to run out of control.