playwright: [BUG] Memory increases when same context is used
Context:
- Playwright Version: Latest (today is 26/044/2021)
- Operating System: Linux
- Node.js version: tested on both node.js version
- Browser: chromium
Describe the bug
I’m watching full-js apps (e.g react/angular websites). I initialize one instance, 1 browser and 1 page. I keep the page in cache & retrieving content every 2 seconds.
After 1/2hours, the memory goes crazy. I tried to reload() the page every 30 minutes. It doesn’t free the memory.
Only way to free the memory is closing the page and recreating a new one.
What could be the source of this memory leak? I suppose reload() frees the javascript-vm so it must be a leak internally to the page
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Reactions: 71
- Comments: 86 (14 by maintainers)
Links to this issue
Commits related to this issue
- chore: dispose stale handles to prevent oom, 1000 of a kind max (#27315) https://github.com/microsoft/playwright/issues/6319 — committed to FANGOD/playwright by pavelfeldman 9 months ago
- chore: do not leak internal page handles after closing page (#24169) Partial fix for https://github.com/microsoft/playwright/issues/6319 After this fix, the following scenario won't leak and the c... — committed to smavodev/playwright by pavelfeldman a year ago
- chore: do not leak internal page handles after closing page (#24169) Partial fix for https://github.com/microsoft/playwright/issues/6319 After this fix, the following scenario won't leak and the c... — committed to OctoMind-dev/playwright by pavelfeldman a year ago
- chore: dispose stale handles to prevent oom, 1000 of a kind max (#27315) https://github.com/microsoft/playwright/issues/6319 — committed to OctoMind-dev/playwright by pavelfeldman 9 months ago
I think being able to have the page flush responses when new ones are received would be a useful feature, and along the lines of the way a typical browser handles responses. Maybe have it as a flag we can set?
While headless browsers have their origins in browser testing, and browser testing continues to be a major use case, data science is a rapidly growing field, and scraping is a major selling point for using headless browsers.
When scraping, you may not want to create an entirely new context with each data retrieval, as the cookies may be important for storing complex states or tokens. This leads to difficulties with the current Playwright implementation. Scrapers are primarily needed when lightweight APIs aren’t available or practical, and unfortunately in the modern web heavy pages of multiple megabytes of data can be included in each response.
If you’re getting HTTP responses every two seconds, you can easily amass over 1GB of additional memory leakage within an hour if all of those responses are being stored. In my case, after 43 minutes my playwright
nodeprocess crashed at 1.9GB of total memory size, with the errorFATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory. Increasing the memory limit will only delay the issue. Being able to prevent a context from cacheing all responses would be ideal.I have a case where I make several requests per minute, it leaks memory all over the place. I’ve tried closing pages, closing contexts, closing browsers, catching all the onClose events and reclosing everything all together, even closing the main playwright instance all the time, nothing works, it still leaks memory and at some point the GC thrashes completely with the CPU setting on fire.
Please provide and/or document a way to fully release all resources so that we can clear it out every now and then. Please don’t just focus on testing, it’s full of other use cases where playwright is needed to run long long times.
@LuizFelipeNeves images can be very important, depending on what you are doing. Automated browsers have lots of different use cases. The project that I used playwright for needed to take screenshots of webpages, images and all. The problem that Playwright is having isn’t a request issue, but a memory leak issue. Reducing the size of the requests only makes it so that it takes longer for the memory overflow to occur, it doesn’t fix the underlying problem. I think we can all agree that we should be able to instruct playwright to not make an infinite cache of request response data between page loads, while caching the things that are designed to persist between page loads like cookies and local storage. Basically, a setting to behave more like a browser. So much of the issue we’re facing with playwright was solved by the early browser developers dating back to Netscape’s cookies in 1994; there’s no need to reinvent the wheel here.
I’ve tried to call
page.close()andcontext.close()and open a new context and page on every 100page.goto(url)calls. I see that your statement that memory is cleaned on a context close is not correct. After ~8 hours of running and about 200,000 network requests (context and page is closed/opened on every 100 requested URLs), Playwright memory usage is about 10 Gigabytes.In other words, while the bug is not fixed, the workaround is to close not only a browser context but also a whole browser (e.g.
await browser.close()) on every N requests, depending on how many free RAM you have. I did not checked whetherbrowser.close()cleans memory, maybe it will be also required to open/close a wholeasync_playwrightcontext manager to flush memory.UPDATE
Please note, I use playwright-python, so some terminology may be related only to it.
Sorry, guys my conclusions above were a bit wrong. I want to add some clarity after more investigations:
My website has lots of pages and to perform some kind of end-to-end testing I need to perform lots of action with my website. I’ve been closing/opening browser periodically withing the same Playwright context manager and after ~1 hour the Playwright used 8 Gb of memory and 31 Gb of swap (in total ~39 Gb of virtual memory).
In other words, I will need regularly restart not only the browser, but close/open the context manager.
I am talking about Playwright-python, so I am not sure that this issue must be exactly (or only) in this repository. The same memory leak is both on the ‘latest’ and ‘dev’ Python releases.
Any solution? Why after a year there is still no fix? This problem hinders playwright’s usability. There’s nothing more annoying than wasting time integrating a third-party package to find out later that it’s riddled with bugs and incompetent developers.
Hey guys - any ETA on when is this going to be fixed?
Upvoting. I’m also facing this issue with Python and the scenario is similar to OP. Node.js JavaScript Runtime consumed more than 4 GB of RAM before crashing. In my use case, I only need to open one Browser, one Context, and one Page. After that, I should be able to navigate through the website without a problem. But what I’m seeing is that the RAM usage for Node.js JavaScript Runtime keeps growing nonstop. I tried closing the Page and reopening every N requests, but it did nothing to clear the RAM.
This memory leak highly hurts Playwright’s reliability.
@liuyazhou1991 it’s called Playwright Test see here: https://playwright.dev/docs/intro
@gigitalz
you can always ask for your money back 👯
Unbounded heap growth should be mitigated by https://github.com/microsoft/playwright/commit/ffd20f43f8ee1a7a016cd9b29c372e25ec685a62. The heap will still saturate to a certain size (1K handles per object type, ~50-100Mb on average), but will stop growing afterwards.
I resolved my issue by doing the following.
Saved the browser’s state to a local file (session, local storage, etc) after creating the browser/context and performing the actions required to meet my needs:
context.StorageState("state.json")Close browser, context and kill all node.exe processes every 30 minutes. (this is where the memory leak exists for me), if you don’t kill them it creates a separate node.exe process every time. The previous process remains in memory taking up space.
Create new browser/context and load in the saved state… navigate back to where you need to be.
context, err := browser.NewContext( playwright.BrowserNewContextOptions{ StorageStatePath: playwright.String("state.json"), })While this won’t help with infinite scroll or other scenarios it might help some of you. A good example where this would work fine is creating a session with a QR code (for my situation) or after a simple login.
Hi all,
I believe that I had very similar issue which I resolved by modifiying
lib\server\browserContext.jsand removing context listener from instrumentation map.I’ve noticed that when a new context is created, it’s being inserted into instrumentation map (
lib\server\instrumentation.js) by callingthis.instrumentation.addListener(contextDebugger, this);but when context is closed, removeListener is never called (or at least that’s happening it my case) so context stays in the map and memory won’t get released.I made some testing for ~12 hours by creating, utilizing and then closing thousands of contexts and they all left hanging in this map even though they were all closed and only “parent” browser was left opened. All the browser tabs were also closed, except for the “parent” one (here I mean chromium instances). NodeJS was consuming gigabytes of RAM and once I removed all the listeners - memory got flushed.
I could try to provide more details if you think that this might be the case.
I have the exact same issue. Use case: scrapping/botting.
Here is a ultra-simple repro:
Both this and https://github.com/microsoft/playwright/issues/8775 are probably duplicates. It shows that there is an issue with playwright itself and not with chromium or webkit.
For readers having the same issue, as a temporary workaround you might use something like PM2 with the memory limit config. It will restart the process when the limit is reached. This is far from ideal though.
Looks like they’re not going to fix this https://github.com/microsoft/playwright/issues/17736
Edit: There have been other threads that have been closed in 2020 https://github.com/microsoft/playwright/issues/4511 https://github.com/microsoft/playwright/issues/4549
At least give us an option to clear the garbage that’s collected. I tried gc.collect in python, but this doesn’t release it and wouldn’t clear what’s built up on the node process anyway
👋 @gigitalz
Ignore @dgtlmoon - he’s very opinionated about open source since he released his “paid hosted service”
The sooner someone with interpersonal skills forks his project the better
Regards
I’ll have to ditch this tool very soon if this is not resolved, so annoying.
browser.newPagedoes internally also create a new context for you and close it once you close the page, so its basically a helper wrapper to simplify its usage.Apart from making demands to people I dont know like some previous commenters, I would like to thank Pavel for adding that heap stack test to npm, that’s a super cool idea. And generally thank the maintainers for their incredible work here, it’s a highly complex project!
Very funny, I can’t because I can’t go back in time, moron.
I can confirm the same,
1.22.0, pythonI saw memory usage up to about 1.5Gb this morning, what is curious is that it’s not all URL/pages that cause this, I’ll report more info
The Playwright test runner gets released soon, stay tuned! It handels all that for you. Headless requires mostly less cpu/memory than headed.See here: https://playwright.dev/docs/intro
The need to close the context and dispose of the playwright object might not be possible/desirable when working with RPA or even scraping infinite scroll pages. It adds a considerable amount of code to work around the bug.
Which utility was used to plot this graph? I would like to reproduce the experiment.
@rigwild 😃 ?
I’m also experiencing the same problem. I use Java 17 (Temurin 17.0.2), Playwright 1.21.0. I defined heap size 2 GB.
My code looks like this:
Results: I cannot parse all the reports from all the pages, it stops during processing the second page (fails with
java.lang.OutOfMemoryError: Java heap space) trying to download reports.There’re 200 reports on a page, average report size is 10 MB.
So I can conclude that
page.close()/browser.close()/playwright.close()(I usetry-with-resourceso it should be done automatically) don’t release used memory:Here’s Heap Memory Chart over time (image link): https://www.dropbox.com/s/ibn8y5j6iq5d4so/playwright-oom.png?dl=0
It’s tough and it’s a long time ticket. There is no better solution in two years to solve memory problem. Do you think it’s a good design? Maybe the original intention is to simplify usage. Of courese it’s more simple than puppeteer in some senario. But I think at least it should keep some release memory funtions which mean I call these funtions. I promise I will not use previouse response and …
context.close? no, it can not release memory. I used it at previous version. Another, I don’t want to close it.So I migrate to pupppeter. I wish playwright will be better soon. It supports more browsers.
Tip - I incorrectly blamed playwright for a memory leak in my app, I have a class which wraps playwright todo a little web-page IO, initially to me it looked like
page.evaluate()and other calls were causing the memory to get used and never recycled/emptied However strangely, when I tried the following, (from https://github.com/weblyzard/inscriptis/issues/65 ) it also resolved the issue wherepage.evaluate(...)used a lot of RAM and never returned it back to the system (it returns a very large JSON struct)My advice here - try to be sure that your own app is not doing something unexpected, be 100% sure that something like LXML’s memory leak bug is not lurking around somewhere
Please, can a maintainer lock this conversation? New comments just keep repeating each other. Everything has already been said. Closing the context does not fix the issue.
Repro provided at comment https://github.com/microsoft/playwright/issues/6319#issuecomment-917705023
We’re facing the same issues trying to reuse browser context, and just creating a
newPagefor each request.We are closing the pages, but Playwright seems to not dispose those references and the memory leak is quite server on a http server.
This is very easy to reproduce, I could prepare a reproducer if needed.
@aslushnikov whats the status of the memory leak fix?
In our case, we get memory leaking on a per context basis even if we close context AND the browser itself. No clear way of restoring it to baseline, so we are triggering a health check fail. Super duper hacky trying to run Playwright in production …
I saw a PR with a fix though, has it been released?
👋 @gigitalz,
dgtlmoon isn’t well socialised
Ignoring is the only language he speaks
Regards
🥩
@tzbo tough comment, why dont you make something better?
I think it’s a stupid design in playwright. Any programmer will hate the memory problem and that is not in his control.
I raised an issue for this, it got closed, then a dev responded somewhere for me to remake the ticket. I got lazy as I put time into it.
Anyway, make a loop spamming about:blank and watch the memory usage for python / node (both increase pretty much in tandem). If you do it for a bigger page, like amazon.com it goes up much faster
A solution for this would be to allow us to clear the memory out. I tried garbage clear in python but nothing changes.
If this would ruin the traces or something I’d understand, but we should still be able to manually clear it with a disclaimer on the method or something
I would love a solution as well. I am using
const context = await browser.newContext();and thenconst page = await context.newPage();and still getting this. Trying to do infinite scroll for certain pages. Even after finishing that the memory usage just keeps increasing…JProfile
This issue is still happening.
And yet we still have no fix. I’ll be digging into the source code later on to see if I can flush it or signal a flush somehow. Obviously developers won’t be doing anything about this.
They have explained you must close the context and dispose of the playwright object. You can’t keep the context alive for a long time. A stupid comment like that helps no one.
@z719893361 For me, I did a workaround. But this is something that cannot be replicated with simple code.
Solution was to close the page for each
goto(): https://github.com/roniemartinez/dude/pull/174This is the usual dump f. take people come with when they have zero priority to solve a ticket.
Any update?
With Node JS v16.17.1 & Playwright v1.26.1, this problem no longer seems to be an issue. I run it about 60 minutes, 2000 pages loaded. Repro:
I also encountered this memory that caused the server to crash every hour. I had no choice but to switch to Puppeteer, and not only the memory was stable, but the page loading was also faster.
Switching to Puppeteer was the best workaround for me. I used the following Dockerfile:
Kind of did the same thing, except I didn’t have to save any state, just killed the whole playwright process tree and relooped to create a new spin of the same stuff. Cringe AF.
@limestackscode Browser.newPage() does create a new context for you internally. You can re-create your context, and then it will release the memory accordingly.
Same here on Python. I got almost 6GB memory usage doingpage.goto()on a few hundred URLs before crashing (this is just one page).Ignore this. Although, there was some “javascript” errors in console during the crash, it was not that high of a memory usage now (cannot replicate it).
I’m also with the same problem as @AIGeneratedUsername. I was trying to use playwright to monitor the network of a page and this memory leak shows me that playwright is not a good fit for this use case.
Some Maintainers said that this happens by design on python repo and on java repo. Maybe would be worth it to add this to the documentation since it would save other people time to avoid this use case.
Currently the objects for e.g. request/response/route get only flushed when a new context is created. For testing you typically create a new context for each test. I folded a few issues into this one of users who also faced into that, so we can better keep track of it and might find a workaround for it in the future.