playwright: feat: better support for visual regression testing

Playwright Test has a built-in toMatchSnapshot() method to power Visual Regression Testing (VRT).

However, VRT is still challenging due to variances in the host environments. There’s a bunch of measures we can do right away to drastically improve experience in @playwright/test

  • support for docker test fixture to run browsers inside docker image.
  • support for blur in matching snapshot to counteract antialiasing
  • better UI for reviewing snapshot diffs

Interesting context:

About this issue

  • Original URL
  • State: open
  • Created 3 years ago
  • Reactions: 66
  • Comments: 64 (8 by maintainers)

Commits related to this issue

Most upvoted comments

I had different screenshots with antialiased fonts between my ArchLinux laptop and Ubuntu 20.04 in Docker (it’s used by default by GitHub Actions). The following Chromium flags helped me to get identical screenshots:

--font-render-hinting=none
--disable-skia-runtime-opts
--disable-font-subpixel-positioning
--disable-lcd-text

From maintainers

Hey folks! if you have examples of PNG screenshots that are taken on the same browser and same OS yet are different due to anti-aliasing issues, could you please attach the “expected”, “actual” and “diff” images here?

This information will help with our experiments with fighting browser rendering non-determinism.

Would it be possible to do visual diff’s even when the snapshot sizes differ (Sizes differ; expected image...)? Right now it seems the Playwright VRT refuses to do visual diffs for such snapshots.

Storybook addon storyshots at least has this feature, though it might come from pixelmatch, not sure. It’s very convenient, as often there’s white space changes and you can easily see that some padding has appeared somewhere.

On a related note: It would be great if tests could be run cross-plattform. Currently the os platform name is baked into the snapshot filename, so our CI tests sometime fail due to name miss-match. https://github.com/microsoft/playwright/issues/7575

Slider is rarely useful for me. An onion-skin (transparency overlay) would be more useful.

Thanks for thinking on this, blur feature is something that will help us, we have something similar before with puppeter that help us to do comparisson in animated pages, in addition to that something that can be really useful is be able to ignore specific parts of the screen, specially in those parts where we have more dynamic data(videos/images)

An update from our experiences above: We found that increasing maxDiffPixels (or maxDiffPixelRatio) to a level that could avoid false failures also led to too many regressions slipping through visual comparisons. However, the threshold option as documented https://playwright.dev/docs/api/class-pageassertions#page-assertions-to-have-screenshot-2 worked for us. Once we increased that from the default 0.2 to 0.3, we’ve had no false failures or missed regressions.

We test screenshots off add-in the web MS Office 365 Excel. In some cases, size of add-in is 1px bigger than original. It seems we cannot control it. MS Office decides for this and is not deterministc. Image diff is negligible, and we could ignore it, but since size of image do not match toMatchSnapshot fails. Currently we do not have good workaround for that problem.

I would vote for toMatchSnapshot be able to compare images of different size.

I’m curious how others plan to incorporate this into their software development workflow! It seems like the biggest piece missing from Playwright now is the ability to approve changes outside of running the app locally. (This is where backstopjs is still useful). Have others come up with a way to create some type of workflow on the PR that allows teams to easily review and approve changes?

I have come up with a system where the devs can comment on the PR to run a CI task that reruns the tests with an --update-snapshots flag and pushes the any changes to the PR branch. But this requires rerunning the entire tests again, which is pretty slow considering the test report from the last run already has the accepted new snapshots in it.

It would be nice to have some kind of “accept snapshots” command we could run that takes an output from a test run where the snapshots comparison failed and it updates them from that. Even if it needs some kind of special report format, that would speed up this part of the workflow considerably.

With the release of 1.21, Playwright now has the “Slider Diff View” which is great for comparing visual changes on the .toMatchSnapshot() assertion.

I’m curious how others plan to incorporate this into their software development workflow! It seems like the biggest piece missing from Playwright now is the ability to approve changes outside of running the app locally. (This is where backstopjs is still useful). Have others come up with a way to create some type of workflow on the PR that allows teams to easily review and approve changes?

To be clear, I’m not necessarily saying that this should be part of Playwright.

image

There’s a separate (related) issue regarding adding support for docker at https://github.com/microsoft/playwright/issues/20954 so that visual regression tests can run in a consistent environment and environment-related differences are negated.

It would be helpful to receive upvotes there from folks here if that’s something you need.

We have an interesting problem that’s probably a common case when doing visual regression testing: we’re taking a screenshot of an element (selected with a locator) … I think the most useful case for visual regression testing is for individual elements, and this does make it very hard to test those if you have any sub-pixel heights (or widths) on the page.

I experienced this issue and my workaround was to take an image of the entire page and then crop the image to select the desired element before comparing.

const dimensions = await element.boundingBox(); expect(await page.screenshot({ type: 'jpeg', clip: dimensions as {x; y; width; height }, })).toMatchSnapshot(${name}.jpeg, {});

Hi @aslushnikov Can you add an option for ignoring diffs where there is just some slight shift in location of pixels? I don’t want to use threshold or maxDiffPixels for this because those options would cause false positives, i.e. they would cause the tests to ignore actual regressions. Here is an example of a diff that I would like to ignore: Actual image: reactions-selector-actual Expected image: reactions-selector-expected The diff: reactions-selector-diff

Thank you very much.

Hi @pastelsky, are those screenshots taken on different operating systems? In a like-for-like environment, you shouldn’t see flake from text rendering differences.

Different operating systems can have different default fonts, and (which appears to be the case here) different text rendering approaches.

We have an interesting problem that’s probably a common case when doing visual regression testing: we’re taking a screenshot of an element (selected with a locator) that has a non-integer height. This results in an interesting problem where (at least when the device pixel ratio is 1) depending on what is on the rest of the page, sometimes the screenshot has a different height, or includes one extra row of the background color outside the element. I think this could even happen for elements that have an integer height, but that are positioned around elements that don’t.

I think the most useful case for visual regression testing is for individual elements, and this does make it very hard to test those if you have any sub-pixel heights (or widths) on the page.

Hi, @aslushnikov! Is it possible that in the next releases you will implement “slider” diff in the html report? There are cases where the slider is more convenient than the pixel highlighting method, especially when the length of the expected and actual screenshots differs.

It would be possible to implement one more tab in the report by analogy with Diff/Actual/Expected?

report sample

or you can display all 3 states on one tab in the report (as it looks in the attachments of this comment)

Hey! @aslushnikov I updated @florianbepunkt’s original port of jest-image-snapshot to playwright test runner here: https://github.com/ayroblu/playwright-image-snapshot. Basically it looks VERY similar to playwright’s existing golden.ts compare api and as you can see in matcher.ts.

The main benefit it is that it uses SSIM. I also updated how the diff is done so it’s similar to pixelmatch’s greyscale background which is super useful.

image
    expect(await page.screenshot()).toMatchImageSnapshot(test.info(), [
      name,
      "1-initial-load.png",
    ]);

Would love to have this SSIM option ported to playwright test as TestInfo is not exposed implicitly which makes the api usage a bit ugly. Made a PR #12258. I’m also hoping not to need to supply a file name by default, seems unnecessary.

Thanks @shamrin for the pointers! I’ll read your links in more details later to get a better understanding, but so far we already do all of these:

  • instead of using CIEDE2000, pixelmatch uses color difference in YIQ color space
  • pixelmatch uses the same algorithm based on the same whitepaper to ignore anti-aliasing
  • we hide text input caret on the browser level before making a screenshot

I suggest solving biggest pain-point which is how to store this stuff in git repo so it doesn’t blow up in size (to store only last snapshot). Git LFS kinda works but it’s painful. Maybe something else would work better? For reference: https://github.com/americanexpress/jest-image-snapshot/issues/92

Would be great if these snapshot dirs were automatically marked in git to only store last revision.

I actually prefer the pixel highlighting (like Playwright already does), but organize all the failing tests in a UI so I can see what failed without having to poke around three different images.

Also being able to A/B toggle the baseline and the test image is nice in some cases.

Solid integration with Storybook would be beneficial for the work I do. Chromatic and Percy do this really well.

Also a UI for reviewing the diffs would be great.

I am encountering the same issue with chromium (and webkit at an even higher frequency, too high so we disabled it).

Version: Playwright 1.38.1 (but the issue is reproducible as well in 1.39.0) Env: running in ubuntu:jammy on an Apple M1 Pro (but the issue happens in our Linux CI pipeline as well, running in docker makes it pixel perfect between local and CI) What happens: About 5% of the time, randomly one letter is incorrectly positionned, always the same letter. On other screenshots, it might 2 -3 letters, sometime in the middle of a word. More info:

  • No network call, the css is inlined before the HTML.

  • Using chromium arguments (no improvements before / after enabling those arguments):

            '--font-render-hinting=none',
            '--disable-skia-runtime-opts',
            '--disable-system-font-check',
            '--disable-font-subpixel-positioning',
            '--disable-lcd-text',
            '--disable-remote-fonts',
    

My guess: this issue never happens on other screenshots that we are taking using exactly the same configuration, so it has to do with something in the HTML / CSS (that I am probably not allowed to share here)…

Actual / expected / diff (triggering here on the pseudo-locale test but might happen as well on the en-US version):

actual expected diff

I’ve noticed an issue with webkit image rendering where it doesn’t seem to be consistent. Look at the image of the flowers in this picture: a-basic-page-with-embed-images-ID-2008-1-expected

And this one:

a-basic-page-with-embed-images-ID-2008-1-actual

There’s exactly 30 pixels different - and what’s interesting is that when it fails, it’s always 30 pixels.

a-basic-page-with-embed-images-ID-2008-1-diff

If you flip between the two images, one of them appears more aliased or slightly blurred or something. The image is a lossy webp image, so I suppose it could be rendering the image isn’t consistent?

Anyone know if this is expected - something like webkit rendering the image in stages? We’re already waiting on the complete property so JavaScript and playwright consider the image loaded.

@thekp I pass them to playwright.chromium.launch(args=[...]) here (chromium_flags() function is overriden inside a test).

@bezyakina not sure for 1.21 (we’re about to finalize this version), but still possible! It all depends on how much our users need it.

So could you please file this separately to our bug tracker as a feature request? The more likes / upvotes it will collect, the higher priority will be for us, and the faster we’ll implement it!

Many folks mentioned that they want pre-blur to avoid snapshot failures due to a few pixel differences.

A new options has landed on tip-of-tree: pixelCount and pixelRatio. These a supposed to help in these cases. Please give them a try and let me know, if you still need preblur!

$ npm i @playwright/test@next

Hey @z0n, there’s no roadmap. My guesstimate is that we’ll have all the pieces together by summer 2022, the priority of VRT keeps raising.