playwright: [BUG] Memory increases when same context is used

Context:

  • Playwright Version: Latest (today is 26/044/2021)
  • Operating System: Linux
  • Node.js version: tested on both node.js version
  • Browser: chromium

Describe the bug

I’m watching full-js apps (e.g react/angular websites). I initialize one instance, 1 browser and 1 page. I keep the page in cache & retrieving content every 2 seconds.

After 1/2hours, the memory goes crazy. I tried to reload() the page every 30 minutes. It doesn’t free the memory. Only way to free the memory is closing the page and recreating a new one.

What could be the source of this memory leak? I suppose reload() frees the javascript-vm so it must be a leak internally to the page

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 71
  • Comments: 86 (14 by maintainers)

Commits related to this issue

Most upvoted comments

Currently the objects for e.g. request/response/route get only flushed when a new context is created

I think being able to have the page flush responses when new ones are received would be a useful feature, and along the lines of the way a typical browser handles responses. Maybe have it as a flag we can set?

While headless browsers have their origins in browser testing, and browser testing continues to be a major use case, data science is a rapidly growing field, and scraping is a major selling point for using headless browsers.

When scraping, you may not want to create an entirely new context with each data retrieval, as the cookies may be important for storing complex states or tokens. This leads to difficulties with the current Playwright implementation. Scrapers are primarily needed when lightweight APIs aren’t available or practical, and unfortunately in the modern web heavy pages of multiple megabytes of data can be included in each response.

If you’re getting HTTP responses every two seconds, you can easily amass over 1GB of additional memory leakage within an hour if all of those responses are being stored. In my case, after 43 minutes my playwright node process crashed at 1.9GB of total memory size, with the error FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory. Increasing the memory limit will only delay the issue. Being able to prevent a context from cacheing all responses would be ideal.

I have a case where I make several requests per minute, it leaks memory all over the place. I’ve tried closing pages, closing contexts, closing browsers, catching all the onClose events and reclosing everything all together, even closing the main playwright instance all the time, nothing works, it still leaks memory and at some point the GC thrashes completely with the CPU setting on fire.

Please provide and/or document a way to fully release all resources so that we can clear it out every now and then. Please don’t just focus on testing, it’s full of other use cases where playwright is needed to run long long times.

jprofiler_R3kyHEskfF

@LuizFelipeNeves images can be very important, depending on what you are doing. Automated browsers have lots of different use cases. The project that I used playwright for needed to take screenshots of webpages, images and all. The problem that Playwright is having isn’t a request issue, but a memory leak issue. Reducing the size of the requests only makes it so that it takes longer for the memory overflow to occur, it doesn’t fix the underlying problem. I think we can all agree that we should be able to instruct playwright to not make an infinite cache of request response data between page loads, while caching the things that are designed to persist between page loads like cookies and local storage. Basically, a setting to behave more like a browser. So much of the issue we’re facing with playwright was solved by the early browser developers dating back to Netscape’s cookies in 1994; there’s no need to reinvent the wheel here.

Currently the objects for e.g. request/response/route get only flushed when a new context is created.

I’ve tried to call page.close() and context.close() and open a new context and page on every 100 page.goto(url) calls. I see that your statement that memory is cleaned on a context close is not correct. After ~8 hours of running and about 200,000 network requests (context and page is closed/opened on every 100 requested URLs), Playwright memory usage is about 10 Gigabytes.

In other words, while the bug is not fixed, the workaround is to close not only a browser context but also a whole browser (e.g. await browser.close()) on every N requests, depending on how many free RAM you have. I did not checked whether browser.close() cleans memory, maybe it will be also required to open/close a whole async_playwright context manager to flush memory.

UPDATE

Please note, I use playwright-python, so some terminology may be related only to it.

Sorry, guys my conclusions above were a bit wrong. I want to add some clarity after more investigations:

  • closing/opening a browser context does not help to clean resources in the long run
  • closing/opening browser does not help to clean resources in the long run

My website has lots of pages and to perform some kind of end-to-end testing I need to perform lots of action with my website. I’ve been closing/opening browser periodically withing the same Playwright context manager and after ~1 hour the Playwright used 8 Gb of memory and 31 Gb of swap (in total ~39 Gb of virtual memory).

In other words, I will need regularly restart not only the browser, but close/open the context manager.

I am talking about Playwright-python, so I am not sure that this issue must be exactly (or only) in this repository. The same memory leak is both on the ‘latest’ and ‘dev’ Python releases.

Any solution? Why after a year there is still no fix? This problem hinders playwright’s usability. There’s nothing more annoying than wasting time integrating a third-party package to find out later that it’s riddled with bugs and incompetent developers.

Hey guys - any ETA on when is this going to be fixed?

Upvoting. I’m also facing this issue with Python and the scenario is similar to OP. Node.js JavaScript Runtime consumed more than 4 GB of RAM before crashing. In my use case, I only need to open one Browser, one Context, and one Page. After that, I should be able to navigate through the website without a problem. But what I’m seeing is that the RAM usage for Node.js JavaScript Runtime keeps growing nonstop. I tried closing the Page and reopening every N requests, but it did nothing to clear the RAM.

This memory leak highly hurts Playwright’s reliability.

@liuyazhou1991 it’s called Playwright Test see here: https://playwright.dev/docs/intro

@gigitalz

I’ll have to ditch this tool very soon if this is not resolved, so annoying.

you can always ask for your money back 👯

Unbounded heap growth should be mitigated by https://github.com/microsoft/playwright/commit/ffd20f43f8ee1a7a016cd9b29c372e25ec685a62. The heap will still saturate to a certain size (1K handles per object type, ~50-100Mb on average), but will stop growing afterwards.

I resolved my issue by doing the following.

  1. Saved the browser’s state to a local file (session, local storage, etc) after creating the browser/context and performing the actions required to meet my needs: context.StorageState("state.json")

  2. Close browser, context and kill all node.exe processes every 30 minutes. (this is where the memory leak exists for me), if you don’t kill them it creates a separate node.exe process every time. The previous process remains in memory taking up space.

  3. Create new browser/context and load in the saved state… navigate back to where you need to be. context, err := browser.NewContext( playwright.BrowserNewContextOptions{ StorageStatePath: playwright.String("state.json"), })

While this won’t help with infinite scroll or other scenarios it might help some of you. A good example where this would work fine is creating a session with a QR code (for my situation) or after a simple login.

Hi all,

I believe that I had very similar issue which I resolved by modifiying lib\server\browserContext.js and removing context listener from instrumentation map.

I’ve noticed that when a new context is created, it’s being inserted into instrumentation map (lib\server\instrumentation.js) by calling this.instrumentation.addListener(contextDebugger, this); but when context is closed, removeListener is never called (or at least that’s happening it my case) so context stays in the map and memory won’t get released.

I made some testing for ~12 hours by creating, utilizing and then closing thousands of contexts and they all left hanging in this map even though they were all closed and only “parent” browser was left opened. All the browser tabs were also closed, except for the “parent” one (here I mean chromium instances). NodeJS was consuming gigabytes of RAM and once I removed all the listeners - memory got flushed.

I could try to provide more details if you think that this might be the case.

I have a case where I make several requests per minute, it leaks memory all over the place. I’ve tried closing pages, closing contexts, closing browsers, catching all the onClose events and reclosing everything all together, even closing the main playwright instance all the time, nothing works, it still leaks memory and at some point the GC thrashes completely with the CPU setting on fire.

Please provide and/or document a way to fully release all resources so that we can clear it out every now and then. Please don’t just focus on testing, it’s full of other use cases where playwright is needed to run long long times.

jprofiler_R3kyHEskfF

I have the exact same issue. Use case: scrapping/botting.

Here is a ultra-simple repro:

import { chromium } from 'playwright'

const setup = async () => {
  const browser = await chromium.launch({ headless: false })
  let page = await browser.newPage()

  let j = 0
  while (true) {
    for (let i = 0; i < 20; i++, j++) {
      console.log(j, i)
      await page.goto('https://v3.vuejs.org/guide/introduction.html#declarative-rendering')
    }
    console.log('Trying to create a new context, does not fix the leak!')
    await page.close()
    page = await browser.newPage()
  }
}

setup()

Both this and https://github.com/microsoft/playwright/issues/8775 are probably duplicates. It shows that there is an issue with playwright itself and not with chromium or webkit.

For readers having the same issue, as a temporary workaround you might use something like PM2 with the memory limit config. It will restart the process when the limit is reached. This is far from ideal though.

Looks like they’re not going to fix this https://github.com/microsoft/playwright/issues/17736

Edit: There have been other threads that have been closed in 2020 https://github.com/microsoft/playwright/issues/4511 https://github.com/microsoft/playwright/issues/4549

At least give us an option to clear the garbage that’s collected. I tried gc.collect in python, but this doesn’t release it and wouldn’t clear what’s built up on the node process anyway

👋 @gigitalz

Ignore @dgtlmoon - he’s very opinionated about open source since he released his “paid hosted service”

The sooner someone with interpersonal skills forks his project the better

Regards

Any solution? Why after a year there is still no fix? This problem hinders playwright’s usability. There’s nothing more annoying than wasting time integrating a third-party package to find out later that it’s riddled with bugs and incompetent developers.

I’ll have to ditch this tool very soon if this is not resolved, so annoying.

browser.newPage does internally also create a new context for you and close it once you close the page, so its basically a helper wrapper to simplify its usage.

Apart from making demands to people I dont know like some previous commenters, I would like to thank Pavel for adding that heap stack test to npm, that’s a super cool idea. And generally thank the maintainers for their incredible work here, it’s a highly complex project!

@gigitalz

I’ll have to ditch this tool very soon if this is not resolved, so annoying.

you can always ask for your money back 👯

Very funny, I can’t because I can’t go back in time, moron.

I can confirm the same, 1.22.0, python

            browser = browser_type.connect_over_cdp(self.command_executor, timeout=timeout * 1000)

            context = browser.new_context(
                user_agent=request_headers['User-Agent'] if request_headers.get('User-Agent') else 'Mozilla/5.0',
                proxy=self.proxy,
                # This is needed to enable JavaScript execution on GitHub and others
                bypass_csp=True,
                # Should never be needed
                accept_downloads=False
            )

            page = context.new_page()
            response = page.goto(url, timeout=timeout * 1000, wait_until=None)
            page.screenshot(type='jpeg', clip={'x': 1.0, 'y': 1.0, 'width': 1280, 'height': 1024})

            context.close()
            browser.close()

I saw memory usage up to about 1.5Gb this morning, what is curious is that it’s not all URL/pages that cause this, I’ll report more info

The Playwright test runner gets released soon, stay tuned! It handels all that for you. Headless requires mostly less cpu/memory than headed.

See here: https://playwright.dev/docs/intro

The need to close the context and dispose of the playwright object might not be possible/desirable when working with RPA or even scraping infinite scroll pages. It adds a considerable amount of code to work around the bug.

I have a case where I make several requests per minute, it leaks memory all over the place. I’ve tried closing pages, closing contexts, closing browsers, catching all the onClose events and reclosing everything all together, even closing the main playwright instance all the time, nothing works, it still leaks memory and at some point the GC thrashes completely with the CPU setting on fire.

Please provide and/or document a way to fully release all resources so that we can clear it out every now and then. Please don’t just focus on testing, it’s full of other use cases where playwright is needed to run long long times.

jprofiler_R3kyHEskfF

Which utility was used to plot this graph? I would like to reproduce the experiment.

@rigwild 😃 ?

I’m also experiencing the same problem. I use Java 17 (Temurin 17.0.2), Playwright 1.21.0. I defined heap size 2 GB.

My code looks like this:

public void parseReportsFromAllPages() {
    parseReportsFromPage(1);
    parseReportsFromPage(2);
    parseReportsFromPage(3);
    parseReportsFromPage(4);
    parseReportsFromPage(5);
}

public void parseReportsFromPage(final int pageNumber) {
    try (final Playwright playwright = Playwright.create()) {
        try (final Browser browser = playwright.chromium().launch()) {
            try (final Page page = browser.newPage()) {
                final List<String> reportsUrls = parseReportsUrlsFromPage(page, pageNumber);
                final List<ReportDto> reports = parseReports(page, reportsUrls);
                reports.forEach(report -> downloadAndSaveReport(page, report));
            }
            log.info("Page instance has been closed");
        }
        log.info("Browser instance has been closed");
    }
    log.info("Playwright instance has been closed");
}

private List<String> parseReportsUrlsFromPage(Page page, int pageNumber) {
    // page.navigate(...);
    // page.waitForLoadState();
    // page.querySelectorAll().mapToStringList();
}

private List<ReportDto> parseReports(Page page, List<String> reportsUrls) {
    // for each report:
    //     page.navigate(...);
    //     query for data and map to ReportDto
}

private void downloadAndSaveReport(Page page, ReportDto report) {
    // final APIResponse response = page.request().get(report.getDownloadLink())
    // Files.write(file.toPath(), response.body());
}

@Getter
@RequiredArgsConstructor
private static class ReportDto {
    private final String downloadLink;
}

Results: I cannot parse all the reports from all the pages, it stops during processing the second page (fails with java.lang.OutOfMemoryError: Java heap space) trying to download reports.

There’re 200 reports on a page, average report size is 10 MB.

So I can conclude that page.close() / browser.close() / playwright.close() (I use try-with-resource so it should be done automatically) don’t release used memory:

Here’s Heap Memory Chart over time (image link): https://www.dropbox.com/s/ibn8y5j6iq5d4so/playwright-oom.png?dl=0

It’s tough and it’s a long time ticket. There is no better solution in two years to solve memory problem. Do you think it’s a good design? Maybe the original intention is to simplify usage. Of courese it’s more simple than puppeteer in some senario. But I think at least it should keep some release memory funtions which mean I call these funtions. I promise I will not use previouse response and …
context.close? no, it can not release memory. I used it at previous version. Another, I don’t want to close it.So I migrate to pupppeter. I wish playwright will be better soon. It supports more browsers.

Tip - I incorrectly blamed playwright for a memory leak in my app, I have a class which wraps playwright todo a little web-page IO, initially to me it looked like page.evaluate() and other calls were causing the memory to get used and never recycled/emptied However strangely, when I tried the following, (from https://github.com/weblyzard/inscriptis/issues/65 ) it also resolved the issue where page.evaluate(...) used a lot of RAM and never returned it back to the system (it returns a very large JSON struct)

  self.xpath_data = page.evaluate("async () => {" + self.xpath_element_js + "}")
  import ctypes
  libc = ctypes.CDLL("libc.so.6")
  libc.malloc_trim(0)

My advice here - try to be sure that your own app is not doing something unexpected, be 100% sure that something like LXML’s memory leak bug is not lurking around somewhere

Please, can a maintainer lock this conversation? New comments just keep repeating each other. Everything has already been said. Closing the context does not fix the issue.

Repro provided at comment https://github.com/microsoft/playwright/issues/6319#issuecomment-917705023

We’re facing the same issues trying to reuse browser context, and just creating a newPage for each request.

We are closing the pages, but Playwright seems to not dispose those references and the memory leak is quite server on a http server.

This is very easy to reproduce, I could prepare a reproducer if needed.

@aslushnikov whats the status of the memory leak fix?

In our case, we get memory leaking on a per context basis even if we close context AND the browser itself. No clear way of restoring it to baseline, so we are triggering a health check fail. Super duper hacky trying to run Playwright in production …

I saw a PR with a fix though, has it been released?

Screen Shot 2022-07-29 at 1 17 19 PM

👋 @gigitalz,

dgtlmoon isn’t well socialised

Ignoring is the only language he speaks

Regards

🥩

I think it’s a stupid design in playwright. Any programmer will hate the memory problem and that is not in his control.

@tzbo tough comment, why dont you make something better?

I think it’s a stupid design in playwright. Any programmer will hate the memory problem and that is not in his control.

I would love a solution as well. I am using const context = await browser.newContext(); and then const page = await context.newPage(); and still getting this. Trying to do infinite scroll for certain pages. Even after finishing that the memory usage just keeps increasing…

I raised an issue for this, it got closed, then a dev responded somewhere for me to remake the ticket. I got lazy as I put time into it.

Anyway, make a loop spamming about:blank and watch the memory usage for python / node (both increase pretty much in tandem). If you do it for a bigger page, like amazon.com it goes up much faster

A solution for this would be to allow us to clear the memory out. I tried garbage clear in python but nothing changes.

If this would ruin the traces or something I’d understand, but we should still be able to manually clear it with a disclaimer on the method or something

I would love a solution as well. I am using const context = await browser.newContext(); and then const page = await context.newPage(); and still getting this. Trying to do infinite scroll for certain pages. Even after finishing that the memory usage just keeps increasing…

I have a case where I make several requests per minute, it leaks memory all over the place. I’ve tried closing pages, closing contexts, closing browsers, catching all the onClose events and reclosing everything all together, even closing the main playwright instance all the time, nothing works, it still leaks memory and at some point the GC thrashes completely with the CPU setting on fire. Please provide and/or document a way to fully release all resources so that we can clear it out every now and then. Please don’t just focus on testing, it’s full of other use cases where playwright is needed to run long long times. jprofiler_R3kyHEskfF

Which utility was used to plot this graph? I would like to reproduce the experiment.

@rigwild 😃 ?

JProfile

This issue is still happening.

Please, can a maintainer lock this conversation? New comments just keep repeating each other. Everything has already been said. Closing the context does not fix the issue.

Repro provided at comment #6319 (comment)

And yet we still have no fix. I’ll be digging into the source code later on to see if I can flush it or signal a flush somehow. Obviously developers won’t be doing anything about this.

They have explained you must close the context and dispose of the playwright object. You can’t keep the context alive for a long time. A stupid comment like that helps no one.

@z719893361 For me, I did a workaround. But this is something that cannot be replicated with simple code.

Solution was to close the page for each goto(): https://github.com/roniemartinez/dude/pull/174

@tzbo tough comment, why dont you make something better?

This is the usual dump f. take people come with when they have zero priority to solve a ticket.

Any update?

With Node JS v16.17.1 & Playwright v1.26.1, this problem no longer seems to be an issue. I run it about 60 minutes, 2000 pages loaded. Repro:

import { chromium } from 'playwright'

const setup = async () => {
  const browser = await chromium.launch({ headless: false })
  let page = await browser.newPage()

  let j = 0
  while (true) {
    for (let i = 0; i < 20; i++) {
      console.log(j, i)
      await page.goto('https://httpbin.org/delay/1')
    }

    j++;

    console.log('Trying to create a new context, does not fix the leak!')
    await page.close()
    page = await browser.newPage()
  }
}

setup()

image

I also encountered this memory that caused the server to crash every hour. I had no choice but to switch to Puppeteer, and not only the memory was stable, but the page loading was also faster.

Switching to Puppeteer was the best workaround for me. I used the following Dockerfile:

FROM node:19.6.0-alpine

# Installs latest Chromium package.
RUN apk add --no-cache \
      chromium \
      nss \
      freetype \
      harfbuzz \
      ca-certificates \
      ttf-freefont \
      dumb-init

# Tell Puppeteer to skip installing Chrome. We'll be using the installed package.
ENV PUPPETEER_SKIP_CHROMIUM_DOWNLOAD=true
ENV PUPPETEER_EXECUTABLE_PATH=/usr/bin/chromium-browser

# Copy the files necessary for the build
# ... 

# Install and build
RUN npm install
RUN npm run build

# Expose and run the server
EXPOSE 8080
CMD ["node", "build"]

I resolved my issue by doing the following.

  1. Saved the browser’s state to a local file (session, local storage, etc) after creating the browser/context and performing the actions required to meet my needs: context.StorageState("state.json")
  2. Close browser, context and kill all node.exe processes every 30 minutes. (this is where the memory leak exists for me), if you don’t kill them it creates a separate node.exe process every time. The previous process remains in memory taking up space.
  3. Create new browser/context and load in the saved state… navigate back to where you need to be. context, err := browser.NewContext( playwright.BrowserNewContextOptions{ StorageStatePath: playwright.String("state.json"), })

While this won’t help with infinite scroll or other scenarios it might help some of you. A good example where this would work fine is creating a session with a QR code (for my situation) or after a simple login.

Kind of did the same thing, except I didn’t have to save any state, just killed the whole playwright process tree and relooped to create a new spin of the same stuff. Cringe AF.

@limestackscode Browser.newPage() does create a new context for you internally. You can re-create your context, and then it will release the memory accordingly.

Same here on Python. I got almost 6GB memory usage doing page.goto() on a few hundred URLs before crashing (this is just one page).

Ignore this. Although, there was some “javascript” errors in console during the crash, it was not that high of a memory usage now (cannot replicate it).

I’m also with the same problem as @AIGeneratedUsername. I was trying to use playwright to monitor the network of a page and this memory leak shows me that playwright is not a good fit for this use case.

Some Maintainers said that this happens by design on python repo and on java repo. Maybe would be worth it to add this to the documentation since it would save other people time to avoid this use case.

Currently the objects for e.g. request/response/route get only flushed when a new context is created. For testing you typically create a new context for each test. I folded a few issues into this one of users who also faced into that, so we can better keep track of it and might find a workaround for it in the future.