puppeteer: Zombie Process problem.

Hello,

Recently we talked about this problem in the issues #1823 and #1791.

Environment:

Use Case:

We are using puppeteer on AWS Lambda. We take a screenshot of given HTML template and upload it to S3 and use this image for future requests It handles over 100 million requests each month. That’s why every process should be atomic and immutable. (AWS Lambda has a disk and process limit.)

Example Code:

const browser = await puppeteer.launch({
  args: ['--disable-gpu', '--no-sandbox', '--single-process', 
             '--disable-web-security', '--disable-dev-profile']
});
const page = await browser.newPage();
await page.goto('https://s3bucket.com/markup/a.html');
const response = await page.screenshot({{ type: 'jpeg', quality: 95 }});
browser.close();

Problem

When we are using example code, we got disk error from AWS Lambda.

Example /tmp folder:

2018-01-12T14:55:38.553Z    a6ef3454-f7a8-11e7-be0f-17f405d5a180    start stdout: total 226084
drwx------ 3 sbx_user1067 479 4096 Jan 12 14:55 .
drwxr-xr-x 21 root root 4096 Jan 12 10:53 ..
-rw------- 1 sbx_user1067 479 15126528 Jan 12 14:33 core.headless-chromi.129
-rw------- 1 sbx_user1067 479 15126528 Jan 12 14:15 core.headless-chromi.131
-rw------- 1 sbx_user1067 479 15126528 Jan 12 14:49 core.headless-chromi.135
-rw------- 1 sbx_user1067 479 15126528 Jan 12 14:52 core.headless-chromi.137
-rw------- 1 sbx_user1067 479 15126528 Jan 12 14:50 core.headless-chromi.138
-rw------- 1 sbx_user1067 479 15126528 Jan 12 14:51 core.headless-chromi.14
-rw------- 1 sbx_user1067 479 15126528 Jan 12 14:49 core.headless-chromi.15
-rw------- 1 sbx_user1067 479 15126528 Jan 12 14:36 core.headless-chromi.169
-rw------- 1 sbx_user1067 479 15126528 Jan 12 14:15 core.headless-chromi.174
-rw------- 1 sbx_user1067 479 15126528 Jan 12 14:52 core.headless-chromi.178
-rw------- 1 sbx_user1067 479 15126528 Jan 12 14:50 core.headless-chromi.180
drwx------ 3 sbx_user1067 479 4096 Jan 12 14:14 .pki

When we investigated these files, we understood that it is a core dump. We removed these files after the process completed.

When we monitored process list, we saw zombie processes Zombie chrome processes have been growing increasingly. We can’t kill them. AWS Lambda has a maximum process limit. (max 1024 process) That’s why we reach the lambda limits.

483 1 3.3 1.6 1226196 65408 ? Ssl 22:07 0:05 /var/lang/bin/node --max-old-space-size=870 --max-semi-space-size=54 --max-executable-size=109 --expose-gc /var/runtime/node_modules/awslambda/index.js
483 22 0.0 0.0 0 0 ? Z 22:07 0:00 [headless-chromi] <defunct>
483 73 0.0 0.0 0 0 ? Z 22:07 0:00 [headless-chromi] <defunct>
483 119 0.0 0.0 0 0 ? Z 22:07 0:00 [headless-chromi] <defunct>
483 166 0.0 0.0 0 0 ? Z 22:07 0:00 [headless-chromi] <defunct>
483 214 0.0 0.0 0 0 ? Z 22:07 0:00 [headless-chromi] <defunct>
483 262 0.0 0.0 0 0 ? Z 22:07 0:00 [headless-chromi] <defunct>
483 307 0.0 0.0 0 0 ? Z 22:07 0:00 [headless-chromi] <defunct>
483 353 0.0 0.0 0 0 ? Z 22:07 0:00 [headless-chromi] <defunct>
483 1915 0.0 0.0 0 0 ? Z 22:09 0:00 [sh] <defunct>

We couldn’t use dump-init on lambda. Because lambda already has an init system.

How did we fix it? (very hacky method)

We used browser.disconnect() instead of browser.close(). We manualy managed chrome processes such as kill.

Example Code:

browser.on('disconnected', () => {
    console.log('sleeping 100ms'); //  sleep to eliminate race condition  
    setTimeout(function(){
    console.log(`Browser Disconnected... Process Id: ${process}`);
    child_process.exec(`kill -9 ${process}`, (error, stdout, stderr) => {
        if (error) {
        console.log(`Process Kill Error: ${error}`)
        }
        console.log(`Process Kill Success. stdout: ${stdout} stderr:${stderr}`);
    });
}, 100);

Firstly we didn’t use this method. We only killed the process after browser disconnect. We got the following error:

Error: read ECONNRESET at exports._errnoException (util.js:1018:11) at TCP.onread (net.js:568:26)

I think it looks like a puppeteer process management problem. When we used this method, we didn’t receive any puppeteer related errors. How can we fix it?

Thanks.

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Reactions: 96
  • Comments: 53 (3 by maintainers)

Commits related to this issue

Most upvoted comments

I’ve overcome these issues by adding the flags for chrome headless:

const chromeFlags = [
    '--headless',
    '--no-sandbox',
    "--disable-gpu",
    "--single-process",
    "--no-zygote"
]

I think the child processes are orphaned when the parent is killed and that leads to the zombies. With this, I only get one process and it works pretty well

@bahattincinic - thanks, I’ve tried your method of disconnecting + killing the process, and while it does kill the “main” process returned by puppeteer.launch(), each run seems to leave another defunct zombie with a PID that is different than the killed one…

What’s worse, when I run ps aux right after puppeteer.launch(), aside from the “main” process, there is already one that’s defunct, right away, before running code or trying to kill anything.

I’ve tried sending a kill -15, hoping that will allow the main process to clean up its children, but -15 or -9 doesn’t make any difference, so I’m still stuck with an ever-growing list of zombies and rising memory…

Do you have any advice on how you managed to keep it clean of those as well (if you had a similar experience)? I’m also running on Lambda, same args used, puppeteer 1.1.1. Thanks!

I’m also using puppeteer in docker, and I had also tried the puppeteer.launch({ args: ['--no-sandbox', '--no-zygote'] }); but that did not help.

Eventually, I figured out that the init:true flag solves the orphaned zombie process problem, which can be used woth docker-compose, according to docker documentation: https://docs.docker.com/compose/compose-file/compose-file-v3/#init (@bryanlarsen and @zdm also mentioned the --init flag for docker, and I also gained inspiration from the great blog to understand it more https://blog.phusion.nl/2015/01/20/docker-and-the-pid-1-zombie-reaping-problem/ )

I fixed this issue by adding

        '--disable-setuid-sandbox',
        '--no-zygote',

To my launch configs which then makes it so that only one chrome process is created on launch (as I discovered otherwise that when browser.close() was called it only closed one chrome process and it seems like 2 chrome processes are created on launch (of which the second becomes the zombie when the browser is closed))

Am try process lookup and kill process. how about this?

const puppeteer = require('puppeteer');
const ps = require('ps-node-promise-es6');
const _ = require('lodash');

async function getWebPageHtml(targetUrl) {
    const browser = await puppeteer.launch();
    const borwserPID =  browser._process.pid;
    const page = await browser.newPage();

    try {
      const response = await page.goto(targetUrl);
    } catch (error) {
      throw new Error(error);
    } finally {
      await page.close();
      await browser.close();
      const psLookup = await ps.lookup({ pid: borwserPID });

      for (let proc of psLookup) {
        if (_.has(proc, 'pid')) {
          await ps.kill(proc.pid, 'SIGKILL');
        }
      }
    }
  }

@leobudima

We are doing following methods to avoid zombie process.

  • We used browser.close(); instead of killing the process.
  • We are using waitpid (https://www.npmjs.com/package/waitpid) (while (waitpid2.waitpid(-1, 0 | waitpid2.WNOHANG) == -1))
  • We delete /tmp folder after process completed (rm -r /tmp/core.* || true)

if your project doesn’t depend on AWS Lambda, you can use my example project. https://github.com/bahattincinic/puppeteer-docker-example

Hello, Thanks to all for the debug. I run puppeteer inside inside docker. To remove zombie process i added the argument --no-zygote and now, is work fine when i use browser.close(), all process stop.

My code sample:

router.get(`/share/image`, async function (req, res, next) {
    // ... some logic from my project ...

   // puppeteer code
        const browser = await puppeteer.launch({args: ['--no-sandbox', '--single-process', '--no-zygote']});

        try {
            const page = await browser.newPage();
            await page.setViewport({
                width: 1920,
                height: 1080
            });

            await page.setContent(html, {
                waitUntil: "networkidle0"
            });

            let image = await page.screenshot({
                type: 'jpeg',
                quality: 100
            });

            await browser.close();

            res.contentType('image/jpeg');
            res.send(image)
        } catch(e) {
            console.log(e);
            await browser.close();
            res.status(503).end();
        }
});

I not understand fully this flag, but here a little explication: https://codereview.chromium.org/2384163002

Thanks to all

Hi everyone, just wanted to provide a quick warning about the --single-process flag. I have some integration tests that were broken after I started using this flag, because it broke the font rendering and kerning for the generated PDF. I found this Process Models page in the chromium docs:

Finally, for the purposes of comparison, Chromium supports a single process model that can be enabled using the --single-process command-line switch. In this model, both the browser and rendering engine are run within a single OS process.

The single process model provides a baseline for measuring any overhead that the multi-process architectures impose. It is not a safe or robust architecture, as any renderer crash will cause the loss of the entire browser process. It is designed for testing and development purposes, and it may contain bugs that are not present in the other architectures.

I can confirm that the single process model does cause a rendering bug (at least on Chromium 88.x), and that this rendering bug is not present when I remove the --single-process flag. So I would not recommend using this flag, since the docs say that it’s only designed for testing and development purposes, and it shouldn’t really be used in production.

I have thoroughly reviewed the documentation and exhausted all available solutions in an attempt to resolve the zombie process issue. Despite my efforts, the problem persisted. I attempted to terminate process IDs, but within the pods, the zombie processes remained resilient. Devoting several consecutive days to diligently updating every package eventually proved successful. The issue was ultimately resolved by making key adjustments: switching the operating system from Node Alpine to Node Slim Linux and transitioning from Chromium to Chrome as the browser. The specific changes implemented to rectify the problem are outlined below.

If you are working with Puppeteer and encountering zombie process issues, consider employing the following Docker commands. These commands have proven effective in preventing the creation of zombie processes.

FROM node:18-slim RUN apt-get update RUN apt-get upgrade

RUN apt-get update && apt-get install curl gnupg -y
&& curl --location --silent dl-ssl.google.com/linux/linux_sign… | apt-key add -
&& sh -c ‘echo “deb [arch=amd64] dl.google.com/linux/chrome/deb/ stable main” >> /etc/apt/sources.list.d/google.list’
&& apt-get update
&& apt-get install google-chrome-stable -y --no-install-recommends
&& rm -rf /var/lib/apt/lists/*

RUN apt-get update &&
apt-get upgrade && apt-get install -y vim

ADD ./puppetron.tar /usr/share/ WORKDIR /usr/share/puppetron

ENV PUPPETEER_SKIP_CHROMIUM_DOWNLOAD=true ENV SERVICE_PATH=/usr/share/puppetron

CMD node main.js;


Path of browser change to executablePath: ‘/usr/bin/google-chrome’,

This works for me on Debian 10, just kills the process group based on the browser pid:

const browser = await puppeteer.launch({ args: ['--no-sandbox'] })
const page = await browser.newPage()

try {
  await page.goto(url, { waitUntil: 'networkidle2', timeout: 10000 })
  await page.screenshot({ path })
  await page.close()
  await browser.close()
} catch (e) {
  console.error(e)
} finally {
  const pid = -browser.process().pid
  try {
    process.kill(pid, 'SIGKILL')
  } catch (e) {}
}

Interesting, I removed the -- in front of args: [...] and it acts differently. I suspected this based on a previous project I was working on using webdriverio. All the instructions of course include the --, but when I enable an extension using load-extension="${ext}" it very clearly loads the extension because it is reporting bugs from within the extension using dumpio: true. Using -- the extension UBlock Origin is ignored.

EDIT: launching with -- opens a bunch of tabs with the options as the URL. Confirmed by switching to headless: false this is still not working.

EDIT 2: I am getting pretty decent results, it moved past a sig fault and no popups. Using these args partly from this thread and previous research.

args: [
            `--load-extension="${ext}"`,
            '--disable-notifications',
            '--disable-geolocation',
            '--disable-infobars',
            '--disable-session-crashed-bubble',
            '--no-sandbox',
            '--silent-debugger-extension-api',
            '--single-process',
            '--no-zygote',
            '--disable-setuid-sandbox',
        ],

And from here https://superuser.com/questions/912656/how-do-i-stop-my-mac-from-asking-to-accept-incoming-network-connections

sudo codesign --force --deep --sign - node_modules/puppeteer/.local-chromium/mac-706915/chrome-mac/Chromium.app/

@bahattincinic @aslushnikov I’ve briefly touched upon this here; killing the Chromium process aggressively on complete/timeout/errors helped us greatly as well.

Using dumb-init resolved my problems.

Sorry for the noise, I just wanted to confirm in case it can help somebody: in my case simply adding --init to the docker command did work indeed.

If you run it under docker you need to use docker --init option.

Hi, I am also having problems with this, also serverless-chrome with AWS Lambda.

In my case, it looks like it does not have anything to do with the browser cleanup process. It looks like it is being caused by something that happens during Puppeteer launch.

Running ps alx immediately after browser launch gives me this:

F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND
4 487 1 0 20 0 1488300 271872 ep_pol Ssl ? 0:34 /var/lang/bin/node --expose-gc --max-semi-space-size=102 --max-old-space-size=1843 /var/runtime/node_modules/awslambda/index.js
1 487 13 1 20 0 0 0 - Z ? 0:00 [headless-chromi] <defunct>
1 487 59 1 20 0 0 0 - Z ? 0:00 [headless-chromi] <defunct>
1 487 107 1 20 0 0 0 - Z ? 0:00 [headless-chromi] <defunct>
1 487 153 1 20 0 0 0 - Z ? 0:00 [headless-chromi] <defunct>
1 487 203 1 20 0 0 0 - Z ? 0:00 [headless-chromi] <defunct>
0 487 246 1 20 0 1074400 67740 ep_pol Ssl ? 0:00 ./chrome/headless-chromium --disable-background-networking --disable-background-timer-throttling --disable-client-side-phishing-detection --disable-default-apps --disable-dev-shm-usage --disable-extensions --disable-hang-monitor --disable-popup-blocking --disable-prompt-on-repost --disable-sync --disable-translate --metrics-recording-only --no-first-run --safebrowsing-disable-auto-update --enable-automation --password-store=basic --use-mock-keychain --remote-debugging-port=0 --user-data-dir=/tmp/puppeteer_dev_profile-yQiw0t --headless --disable-gpu --hide-scrollbars --mute-audio --no-sandbox --disable-setuid-sandbox --disable-dev-shm-usage --single-process --disable-gpu --no-zygote --user-agent=REDACTED
1 487 248 246 20 0 0 0 - Z ? 0:00 [headless-chromi] <defunct>
0 487 280 1 20 0 115096 1588 - R ? 0:00 ps -alx

See process 248 which is already defunct at this point.

And then after the browser closes:

F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND
4 487 1 0 20 0 1486912 282588 ep_pol Ssl ? 0:41 /var/lang/bin/node --expose-gc --max-semi-space-size=102 --max-old-space-size=1843 /var/runtime/node_modules/awslambda/index.js
1 487 13 1 20 0 0 0 - Z ? 0:00 [headless-chromi] <defunct>
1 487 59 1 20 0 0 0 - Z ? 0:00 [headless-chromi] <defunct>
1 487 107 1 20 0 0 0 - Z ? 0:00 [headless-chromi] <defunct>
1 487 153 1 20 0 0 0 - Z ? 0:00 [headless-chromi] <defunct>
1 487 203 1 20 0 0 0 - Z ? 0:00 [headless-chromi] <defunct>
1 487 248 1 20 0 0 0 - Z ? 0:00 [headless-chromi] <defunct>
0 487 292 1 20 0 115092 1588 - R ? 0:00 ps -alx

Look at process with pid 248 which now has ppid 1.

Is this even a Puppeteer bug?

@bahattincinic - thanks a lot for providing details - waitpid is an interesting approach and I’ll definitely try with cleaning /tmp, hopefully that helps! If I don’t manage to make it run reliably on Lambda, I’m going to have to try with docker - thanks for linking the example!

I fixed zombie process problem when I upgraded the headless chrome version! I recommend using this repository to AWS Lambda user.

@aslushnikov @bahattincinic This might help other facing this problem.

I run Puppeteer session on AWS Lambda for my SaaS https://checklyhq.com. I noticed the same issue of defunct Chrome processes hanging around over multiple Lambda calls. Also, the /tmp directory was piling up with profiles.

Note, this did not happen in 99% of the cases, only when something unforeseen timed out of some other anomaly happened.

I think I’ve managed to fix this without injecting any code into Puppeteer scripts. Here is what I do in a nutshell.

  • I spawn a child node process for running the Puppeteer code
  • I store the PID
  • I wait for the child process to exit or time out
  • I rm -rf the /tmp dir
  • I explicitly kill the PID using the ps-tree package

In semi / pseudo Node.js code

const spawn = require('child_process').spawn
const psTree = require('ps-tree')

// cmd, args and opts are like "node script.js etc." where script is a .js file with some Puppeteer code
const child = spawn(cmd, args, opts)

// wait for the script to run, then kill

await kill()

// then cleanup

await cleanupTmp()

function kill () {
    return new Promise((resolve, reject) => {
      const signal = 'SIGTERM'
      const pid = child.pid
      psTree(pid, (err, children) => {
        if (err) { return reject(err) }
        [pid].concat(
          children.map(p => {
            return p.PID
          })
        ).forEach(tpid => {
          try {
            process.kill(tpid, signal)
          } catch (ex) {}
        })
        return resolve()
      })
    })
  }

function cleanupTmp () {
    return Promise.all([
      rimrafPromise('/tmp/core.chromium.*'),
      rimrafPromise('/tmp/puppeteer_dev_profile*')
    ])
  }

function rimrafPromise (pattern) {
    return new Promise((resolve, reject) => {
      rimraf(pattern, err => {
        if (err) return reject(err)
        resolve()
      })
    })
  }

@Multiply you mean, you did

ENTRYPOINT ["/sbin/tini", "--"]

?

I found that doing await page.goto('about:blank') helps in terms of reducing cpu and memory usage, even if reusing the tabs setting to about:blank between shots somehow seems to keep cpu and memory under control.