puppeteer: Zombie Process problem.
Hello,
Recently we talked about this problem in the issues #1823 and #1791.
Environment:
- Puppeteer Version: 1.0.
- Chrome Version: 64.0.3282.71 (https://github.com/adieuadieu/serverless-chrome/releases/tag/v1.0.0-34)
- Platform / OS version: AWS Lambda
- Node.js version: 6.10 (https://aws.amazon.com/about-aws/whats-new/2017/03/aws-lambda-supports-node-js-6-10/)
Use Case:
We are using puppeteer on AWS Lambda. We take a screenshot of given HTML template and upload it to S3 and use this image for future requests It handles over 100 million requests each month. That’s why every process should be atomic and immutable. (AWS Lambda has a disk and process limit.)
Example Code:
const browser = await puppeteer.launch({
args: ['--disable-gpu', '--no-sandbox', '--single-process',
'--disable-web-security', '--disable-dev-profile']
});
const page = await browser.newPage();
await page.goto('https://s3bucket.com/markup/a.html');
const response = await page.screenshot({{ type: 'jpeg', quality: 95 }});
browser.close();
Problem
When we are using example code, we got disk error from AWS Lambda.
Example /tmp folder:
2018-01-12T14:55:38.553Z a6ef3454-f7a8-11e7-be0f-17f405d5a180 start stdout: total 226084
drwx------ 3 sbx_user1067 479 4096 Jan 12 14:55 .
drwxr-xr-x 21 root root 4096 Jan 12 10:53 ..
-rw------- 1 sbx_user1067 479 15126528 Jan 12 14:33 core.headless-chromi.129
-rw------- 1 sbx_user1067 479 15126528 Jan 12 14:15 core.headless-chromi.131
-rw------- 1 sbx_user1067 479 15126528 Jan 12 14:49 core.headless-chromi.135
-rw------- 1 sbx_user1067 479 15126528 Jan 12 14:52 core.headless-chromi.137
-rw------- 1 sbx_user1067 479 15126528 Jan 12 14:50 core.headless-chromi.138
-rw------- 1 sbx_user1067 479 15126528 Jan 12 14:51 core.headless-chromi.14
-rw------- 1 sbx_user1067 479 15126528 Jan 12 14:49 core.headless-chromi.15
-rw------- 1 sbx_user1067 479 15126528 Jan 12 14:36 core.headless-chromi.169
-rw------- 1 sbx_user1067 479 15126528 Jan 12 14:15 core.headless-chromi.174
-rw------- 1 sbx_user1067 479 15126528 Jan 12 14:52 core.headless-chromi.178
-rw------- 1 sbx_user1067 479 15126528 Jan 12 14:50 core.headless-chromi.180
drwx------ 3 sbx_user1067 479 4096 Jan 12 14:14 .pki
When we investigated these files, we understood that it is a core dump. We removed these files after the process completed.
When we monitored process list, we saw zombie processes Zombie chrome processes have been growing increasingly. We can’t kill them. AWS Lambda has a maximum process limit. (max 1024 process) That’s why we reach the lambda limits.
483 1 3.3 1.6 1226196 65408 ? Ssl 22:07 0:05 /var/lang/bin/node --max-old-space-size=870 --max-semi-space-size=54 --max-executable-size=109 --expose-gc /var/runtime/node_modules/awslambda/index.js
483 22 0.0 0.0 0 0 ? Z 22:07 0:00 [headless-chromi] <defunct>
483 73 0.0 0.0 0 0 ? Z 22:07 0:00 [headless-chromi] <defunct>
483 119 0.0 0.0 0 0 ? Z 22:07 0:00 [headless-chromi] <defunct>
483 166 0.0 0.0 0 0 ? Z 22:07 0:00 [headless-chromi] <defunct>
483 214 0.0 0.0 0 0 ? Z 22:07 0:00 [headless-chromi] <defunct>
483 262 0.0 0.0 0 0 ? Z 22:07 0:00 [headless-chromi] <defunct>
483 307 0.0 0.0 0 0 ? Z 22:07 0:00 [headless-chromi] <defunct>
483 353 0.0 0.0 0 0 ? Z 22:07 0:00 [headless-chromi] <defunct>
483 1915 0.0 0.0 0 0 ? Z 22:09 0:00 [sh] <defunct>
We couldn’t use dump-init on lambda. Because lambda already has an init system.
How did we fix it? (very hacky method)
We used browser.disconnect() instead of browser.close(). We manualy managed chrome processes such as kill.
Example Code:
browser.on('disconnected', () => {
console.log('sleeping 100ms'); // sleep to eliminate race condition
setTimeout(function(){
console.log(`Browser Disconnected... Process Id: ${process}`);
child_process.exec(`kill -9 ${process}`, (error, stdout, stderr) => {
if (error) {
console.log(`Process Kill Error: ${error}`)
}
console.log(`Process Kill Success. stdout: ${stdout} stderr:${stderr}`);
});
}, 100);
Firstly we didn’t use this method. We only killed the process after browser disconnect. We got the following error:
Error: read ECONNRESET at exports._errnoException (util.js:1018:11) at TCP.onread (net.js:568:26)
I think it looks like a puppeteer process management problem. When we used this method, we didn’t receive any puppeteer related errors. How can we fix it?
Thanks.
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Reactions: 96
- Comments: 53 (3 by maintainers)
Commits related to this issue
- build: ensure kill chrome on disconnect avoid zombie process, it's based on https://github.com/GoogleChrome/puppeteer/issues/1825 — committed to microlinkhq/browserless by Kikobeats 5 years ago
- feat: kill & respawn a new browser instance under diconnect (#68) * build: ensure kill chrome on disconnect avoid zombie process, it's based on https://github.com/GoogleChrome/puppeteer/issues/182... — committed to microlinkhq/browserless by Kikobeats 5 years ago
- build: ensure kill chrome on disconnect avoid zombie process, it's based on https://github.com/GoogleChrome/puppeteer/issues/1825 — committed to microlinkhq/browserless by Kikobeats 5 years ago
- Fix: Close browser page Addresses puppeteer zombie problem. see: https://github.com/puppeteer/puppeteer/issues/1825 — committed to cederigo/grover by cederigo 4 years ago
- Fix: Close browser page Addresses puppeteer zombie problem. see: https://github.com/puppeteer/puppeteer/issues/1825 — committed to cederigo/grover by cederigo 4 years ago
- fix: improve Ctrl + C support (#6011) Fix child process killing when the parent process SIGINTs. If you `ctrl + c` the Puppeteer parent process, we would sometimes not properly handle killing of t... — committed to puppeteer/puppeteer by TimvdLippe 4 years ago
- Run chromium with --single-process and --no-zygote flags Attempts to prevent leaking zombie chromium processes by running chromium with the --single-process and --no-zygote flags. Flag descriptions:... — committed to wikimedia/mediawiki-services-chromium-render by mdholloway 4 years ago
- Chrome Zombie Process Bugfix https://github.com/puppeteer/puppeteer/issues/1825 — committed to MobiMedia/fury by mobiawa 4 years ago
- try fix orphan https://github.com/puppeteer/puppeteer/issues/1825#issuecomment-651755428 — committed to olc-systems-sro/puppeteer-pdf by petrprikryl 2 years ago
I’ve overcome these issues by adding the flags for chrome headless:
I think the child processes are orphaned when the parent is killed and that leads to the zombies. With this, I only get one process and it works pretty well
@bahattincinic - thanks, I’ve tried your method of disconnecting + killing the process, and while it does kill the “main” process returned by
puppeteer.launch(), each run seems to leave another defunct zombie with a PID that is different than the killed one…What’s worse, when I run
ps auxright afterpuppeteer.launch(), aside from the “main” process, there is already one that’s defunct, right away, before running code or trying to kill anything.I’ve tried sending a kill -15, hoping that will allow the main process to clean up its children, but -15 or -9 doesn’t make any difference, so I’m still stuck with an ever-growing list of zombies and rising memory…
Do you have any advice on how you managed to keep it clean of those as well (if you had a similar experience)? I’m also running on Lambda, same
argsused, puppeteer 1.1.1. Thanks!I’m also using puppeteer in docker, and I had also tried the
puppeteer.launch({ args: ['--no-sandbox', '--no-zygote'] });but that did not help.Eventually, I figured out that the
init:trueflag solves the orphaned zombie process problem, which can be used woth docker-compose, according to docker documentation: https://docs.docker.com/compose/compose-file/compose-file-v3/#init (@bryanlarsen and @zdm also mentioned the --init flag fordocker, and I also gained inspiration from the great blog to understand it more https://blog.phusion.nl/2015/01/20/docker-and-the-pid-1-zombie-reaping-problem/ )I fixed this issue by adding
To my launch configs which then makes it so that only one chrome process is created on launch (as I discovered otherwise that when
browser.close()was called it only closed one chrome process and it seems like 2 chrome processes are created on launch (of which the second becomes the zombie when the browser is closed))Am try process lookup and kill process. how about this?
@leobudima
We are doing following methods to avoid zombie process.
browser.close();instead of killing the process.while (waitpid2.waitpid(-1, 0 | waitpid2.WNOHANG) == -1))rm -r /tmp/core.* || true)if your project doesn’t depend on AWS Lambda, you can use my example project. https://github.com/bahattincinic/puppeteer-docker-example
Hello, Thanks to all for the debug. I run puppeteer inside inside docker. To remove zombie process i added the argument
--no-zygoteand now, is work fine when i usebrowser.close(), all process stop.My code sample:
I not understand fully this flag, but here a little explication: https://codereview.chromium.org/2384163002
Thanks to all
Hi everyone, just wanted to provide a quick warning about the
--single-processflag. I have some integration tests that were broken after I started using this flag, because it broke the font rendering and kerning for the generated PDF. I found this Process Models page in the chromium docs:I can confirm that the single process model does cause a rendering bug (at least on Chromium
88.x), and that this rendering bug is not present when I remove the--single-processflag. So I would not recommend using this flag, since the docs say that it’s only designed for testing and development purposes, and it shouldn’t really be used in production.I have thoroughly reviewed the documentation and exhausted all available solutions in an attempt to resolve the zombie process issue. Despite my efforts, the problem persisted. I attempted to terminate process IDs, but within the pods, the zombie processes remained resilient. Devoting several consecutive days to diligently updating every package eventually proved successful. The issue was ultimately resolved by making key adjustments: switching the operating system from Node Alpine to Node Slim Linux and transitioning from Chromium to Chrome as the browser. The specific changes implemented to rectify the problem are outlined below.
If you are working with Puppeteer and encountering zombie process issues, consider employing the following Docker commands. These commands have proven effective in preventing the creation of zombie processes.
FROM node:18-slim RUN apt-get update RUN apt-get upgrade
RUN apt-get update && apt-get install curl gnupg -y
&& curl --location --silent dl-ssl.google.com/linux/linux_sign… | apt-key add -
&& sh -c ‘echo “deb [arch=amd64] dl.google.com/linux/chrome/deb/ stable main” >> /etc/apt/sources.list.d/google.list’
&& apt-get update
&& apt-get install google-chrome-stable -y --no-install-recommends
&& rm -rf /var/lib/apt/lists/*
RUN apt-get update &&
apt-get upgrade && apt-get install -y vim
ADD ./puppetron.tar /usr/share/ WORKDIR /usr/share/puppetron
ENV PUPPETEER_SKIP_CHROMIUM_DOWNLOAD=true ENV SERVICE_PATH=/usr/share/puppetron
CMD node main.js;
Path of browser change to executablePath: ‘/usr/bin/google-chrome’,
This works for me on Debian 10, just kills the process group based on the browser pid:
Interesting, I removed the
--in front ofargs: [...]and it acts differently. I suspected this based on a previous project I was working on using webdriverio. All the instructions of course include the--, but when I enable an extension usingload-extension="${ext}"it very clearly loads the extension because it is reporting bugs from within the extension usingdumpio: true. Using--the extension UBlock Origin is ignored.EDIT: launching with
--opens a bunch of tabs with the options as the URL. Confirmed by switching toheadless: falsethis is still not working.EDIT 2: I am getting pretty decent results, it moved past a sig fault and no popups. Using these args partly from this thread and previous research.
And from here https://superuser.com/questions/912656/how-do-i-stop-my-mac-from-asking-to-accept-incoming-network-connections
@bahattincinic @aslushnikov I’ve briefly touched upon this here; killing the Chromium process aggressively on complete/timeout/errors helped us greatly as well.
Using dumb-init resolved my problems.
Sorry for the noise, I just wanted to confirm in case it can help somebody: in my case simply adding
--initto the docker command did work indeed.If you run it under docker you need to use docker
--initoption.Hi, I am also having problems with this, also serverless-chrome with AWS Lambda.
In my case, it looks like it does not have anything to do with the browser cleanup process. It looks like it is being caused by something that happens during Puppeteer launch.
Running
ps alximmediately after browser launch gives me this:See process 248 which is already defunct at this point.
And then after the browser closes:
Look at process with pid 248 which now has ppid 1.
Is this even a Puppeteer bug?
@bahattincinic - thanks a lot for providing details - waitpid is an interesting approach and I’ll definitely try with cleaning /tmp, hopefully that helps! If I don’t manage to make it run reliably on Lambda, I’m going to have to try with docker - thanks for linking the example!
I fixed zombie process problem when I upgraded the headless chrome version! I recommend using this repository to AWS Lambda user.
@aslushnikov @bahattincinic This might help other facing this problem.
I run Puppeteer session on AWS Lambda for my SaaS https://checklyhq.com. I noticed the same issue of defunct Chrome processes hanging around over multiple Lambda calls. Also, the /tmp directory was piling up with profiles.
Note, this did not happen in 99% of the cases, only when something unforeseen timed out of some other anomaly happened.
I think I’ve managed to fix this without injecting any code into Puppeteer scripts. Here is what I do in a nutshell.
/tmpdirps-treepackageIn semi / pseudo Node.js code
@Multiply you mean, you did
?
I found that doing
await page.goto('about:blank')helps in terms of reducing cpu and memory usage, even if reusing the tabs setting to about:blank between shots somehow seems to keep cpu and memory under control.