wgpu: Fastclear Bug With Intel Mesa Adapters on the GL Backend

Description When using the OpenGL backend on Linux the clear color seems to behaving strangely. For one, the color itself is lighter than it should be. For two, the borders around any objects drawn on top of the clear color have an even lighter, pixelated version of the clear color.

I’m currently troubleshooting but I opened the issue to start the discussion. With my experimentation so far it seems completely related to the act of clearing the draw framebuffer. All other rendering and examples seem to work fine, and if you have an example where you can’t see the clear color, such as the skybox example, everything looks great.

Also I’ve noticed some weird related behavior in my Renderdoc captures:

When I launch the example with Renderdoc it has the same problem as running it without Renderdoc ( which makes sense ):

image

And when I view the renderbuffer contents after the initial clear in renderdoc, it shows the clear color like it shows in the render, lighter than it should be:

image

The clear color shows the same in the draw step, all the way until the final framebuffer blit, where it is dark enough ( but still with pixelated edges ):

image

Yet, when I hover over the pixels in the image, the little thumbnail at the bottom shows the wrong ligher color:

image

Very strange.

I found that I could get rid of the pixelated edges by forcing the renderbuffer pixel format to be RGBA8, but the color was still off. I think thats the closest lead I have and I’m going to look into how different pixel formats efffect it, and maybe try binding the framebuffer storage to a texture instead of a renderbuffer and see if that makes any difference.


PS: Very excited that the new GL backend on wgpu-hal is working for all the examples! This is the first time I’ve tried it that the shadow, boids, and skybox examples have worked. I might try to tackle #1617, but I figured I’d try to get this one out of the way first. 😃

Repro steps Run the cube or shadow example with the OpenGL backend.

Expected vs observed behavior There should be no pixelated edges around objects and the clear color should be darker.

Expected: image

Actual ( I think it’s fine if the lines for the trangles aren’t there, the issue is the background color ) : image

Extra materials

Shadow example: image

Platform Running WGPU cube or shadow examples on Linux Pop!_OS ( Ubuntu ) 20.04. Adapter info:

[2021-07-10T16:54:11Z INFO  wgpu_hal::gles::adapter] Vendor: Intel
[2021-07-10T16:54:11Z INFO  wgpu_hal::gles::adapter] Renderer: Mesa Intel(R) UHD Graphics (CML GT2)
[2021-07-10T16:54:11Z INFO  wgpu_hal::gles::adapter] Version: OpenGL ES 3.2 Mesa 20.0.8
[2021-07-10T16:54:11Z INFO  wgpu_hal::gles::adapter] SL version: OpenGL ES GLSL ES 3.20

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 1
  • Comments: 24 (24 by maintainers)

Commits related to this issue

Most upvoted comments

Thanks for testing!

No problem! 😄

Do the reftests work correctly on gl on your machine?

No. Neither the shadow or cube examples will pass the reftest for OpenGL.

So looking at the images, it looks like what you put as the “expected” image is actually completely missing gamma correction and the erroneous image properly has error correction (though with artifacts).

Oh, that’s interesting. Makes sense now that I look at the screenshots in the example dirs. 😃 The reftests for Vulkan will interestingly still pass despite the sRGB difference. Not sure if that’s expected.

Can you run with the env flag INTEL_DEBUG=nofc and see if the bug goes away?

That fixed it! Nice to know that something so simple was actually a driver bug and not something wrong with the code. I didn’t write the code, but I couldn’t figure out for the life of me why a gl.clear_buffer() would be so weird. Also, the reftests will pass with that environment variable set as well.

image


So, in summary, there are 3 separate issues here, if I understand correctly:

  • Vulkan not doing sRGB conversion on Linux: This is happening for me on my Linux machine
  • GL not doing sRGB conversions on Linux: @cwfitzgerald did you say you were getting the darker image even on GL on your machine? I’m not getting that on my machine for GL. Just for Vulkan.
  • Blocky clear borders on GL: This is because of a mesa bug, and can short term be worked around by setting an environment variable.

That leaves the actionable items to be:

  • Figure out how to ensure sRGB conversion is done for Vulkan and GL on Linux
  • Figure out how we want to workaround the mesa bug on GL

Does that sound about right?

@kvark I’ve never actually filed this as I first hit it back when I was a baby graphics programmer and didn’t know how to do such things 😃 I need to collect the information around it (as I suspect #725 is related) and file it up to the appropriate places. Maybe I’ll hit up mesa on IRC now that that’s a thing I know I can do.

@zicklag Thank you for filing this! It is a very detailed issue which is always much appreciated!

So a couple housekeeping things first. Do the reftests work correctly on gl on your machine? You can test this by running WGPU_BACKEND=gl cargo test --example <example> -- --test-threads=1. If they do, this indicates it is something related to swapchain shenanigans.

So looking at the images, it looks like what you put as the “expected” image is actually completely missing gamma correction and the erroneous image properly has error correction (though with artifacts). I can reproduce this lack of srgb conversion on my intel/linux machine on both vulkan and GL. I also confirmed on a separate machine that the darker image shows that there is no proper SRGB transformation going on when there is supposed to be.

So this is 100% a driver bug at this point. As a user, you can so srgb conversion in your fragment shader or tonemapping pass with a regular framebuffer and it will work as expected.

There won’t be a simple fix for this unfortunately. Basically we’re going need to:

  • Detect we’re on intel/mesa.
  • Lie about creating an srgb swapchain.
  • Inject srgb conversion into all shaders that are writing to the swapchain.

This has some major hurtles that needs to be crossed first:

  • We need to figure out standard ways for us to implement driver bug workarounds.
  • Naga needs to be able to inject srgb conversion.
  • We need to have the ability to have more than one backend shader program per wgpu pipeline. This is because, at pipeline creation, we don’t know if the pipeline is going to be used to rendering to a swapchain or not. We need to decide at pipeline bind time.

So this is actually a long standing bug on Intel cards on Linux with srgb. I’m not sure how we actually should be working around this reasonably, but it likely needs to be internal with shader rewriting. I’ll take a look at your pr later tonight.

So I think we should do the following for dealing with the fastclear bug:

  1. We should counter this by doing an explicit shader clear when do you a clear on affected hardware (run a fullscreen triangle that just outputs the clear color). This is what a slow clear is actually doing (and why fastclears are so much faster) so should be no worse.
  2. We can tell if WebGL is affected by using the WEBGL_debug_renderer_info extension to get UNMASKED_RENDERER_WEBGL and UNMASKED_VENDOR_WEBGL. We should be using this to get the device/vendor information anyway. You can check this on your own hardware though https://webglreport.com/?v=2. If this information we should fall back on the normal information, though it would be completely un-helpful. If it’s not there, we’re SOL for fixing this bug.

So I can reproduce the srgb issue on intel/vulkan/windows, so issue 1 is definitely a vulkan backend issue.

There you go! Created #1717. It turned out a little messier than I had hoped because, in order to draw the triangle for the shader clear, I had to add a bunch of boolean state values to keep track of whether gl::DEPTH_TEST and friends were currently enabled or disabled so that I could re-set those values back to whatever they were after disabling them all to draw the triangle.

I’m not sure if there’s a simpler way to do that, but it’s all I could come up with. Let me know if you have any ideas!

Just finished taking both of those measures and it now succesfully works around the bug on both desktop and WebGL. 🎉 Currently it’s in my WebGL branch. I’m not sure if it’s helpful or not, but let me know if you want me to split it out to a separate PR for just the desktop fastclear fix instead of leaving it merged with WebGL.

Great. I opened #1645 with the short-term solution, and I renamed this issue to be specific to the fastclear bug. I’ll open a new issue for the Vulkan sRGB bug.

Yeh we probably should split out the bugs into separate issues, this one could be for the fast clear issue. If you can do it, that’d be great, otherwise I’ll do it a bit later today.

Do you think it’s safe enough just to set the INTEL_DEBUG=nofc variable at instance creation for now and do that regardless of what device you are using?

Yeah this should be fine, as long as we always set it on linux. It shouldn’t affect any other adapters as it’s an env only for the intel cards. This is a fine (if slow) short term solution, but we should work to see if we can find a way to prevent the bug from occurring at all because this is pessimising intel cards w/o the bug.

oh. yeah that would do it XD

This is gonna be fun to try to work around, though GL already relies on a blit, so it should be possible.

No, I’m running X11.

PS: I’m going to look through your comments and test out what you suggested probably within the next few hours.

So the blocking around the triangle is caused by https://gitlab.freedesktop.org/mesa/mesa/-/issues/2565. Can you run with the env flag INTEL_DEBUG=nofc and see if the bug goes away? This isn’t a long term solution, but shows that it is the problem.

The issue about vk not properly doing srgb seems unrelated.