puppeteer: [Bug]: Monorepo breaks build for Cloud Functions

Bug description

Steps to reproduce the problem:

  1. Depend on puppeteer 19.0.0 in a Cloud Function.
  2. Deploy the Cloud Function, the build succeeds.
  3. Invoke the Cloud Function, the error below occurs.

My guess is that the new cache location is not retained in the final Docker image for the Cloud Function. Although the build (and probably the download of Chromium) succeeds, Chromium is not available at runtime.

While the Cloud Function environment is not a direct concern of the puppeteer project, the caching feature introduced in 19.0.0 makes it hard to reliably install Chromium in an environment where we don’t control the build/file system. I’ve tried playing around with PUPPETEER_CACHE_DIR, but again, we have very little guarantees over the build flow and it potentially changing.

Instead of specifying a download location using PUPPETEER_CACHE_DIR, is there any way the old behaviour can optionally be turned on, and let puppeteer figure out where it’s installed? This would avoid having to know the subtlety of the underlying build/file system.

Thanks,

Puppeteer version

19.0.0

Node.js version

16

npm version

8.5.0

What operating system are you seeing the problem on?

Linux

Relevant log output

Error: Could not find expected browser (chrome) locally. Run `npm install` to download the correct Chromium revision (1045629).

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 9
  • Comments: 82 (11 by maintainers)

Commits related to this issue

Most upvoted comments

This is fixed. Just add .puppeteerrc.cjs at the root of your app directory with

const {join} = require('path');

/**
 * @type {import("puppeteer").Configuration}
 */
module.exports = {
  cacheDirectory: join(__dirname, '.cache', 'puppeteer'),
};

Is it possible to implement something similar with firebase functions or must we still downgrade puppeteer there?

From my short testing the following configuration (based on the suggestions from this thread) seems to work with Firebase Functions (v1) and allows me to use Puppeteer.launch({headless: "new"}). So far I didn’t observe any issues with the cache.

firebase.json:

{
  "functions": {
    "ignore": [".cache"]
  },
  ...
}

package.json:

{
  "engines": {
    "node": "18"
  },
  scripts: {
    "postinstall": "node node_modules/puppeteer/install.js",
    "gcp-build": "node node_modules/puppeteer/install.js",
    ...
  },
  "dependencies": {
    "firebase-functions": "4.4.1",
    "puppeteer": "20.9.0",
    ...
  },
  ...
}

.puppeteerrc.cjs:

module.exports = {cacheDirectory: require("path").join(__dirname, ".cache", "puppeteer")};

Sorry for necromancing this thread but what an awful experience this is. I am also struggling to get Puppeteer v19 working with Google Cloud Functions. How is downgrading to v18 an acceptable solution? Whatever change happened from v19 to v18 all of a sudden it requires extra configuration files and still nothing works. This is the worst user experience. Like seriously normally libraries which mature should become easier to use and not worse. This is diabolical. Did anyone find a reliable solution to make puppeteer work with google cloud functions or is everyone downgrading to v18 now?

Wanted to chime in and mention this effected me when I upgraded from v16 to v19 while deploying puppeteer to a Cloud Function.

If a solution could be implemented that didn’t involve having to create an extra configuration file, that’d be awesome. I’ve downgraded to v18 for now.

Since yesterday, the firebase cloud functions deployment is taking too long while using puppeteer, resulting in a timeout. When I remove puppeteer from my package.json, it succeeds.

Note: the same package was deployed successfully in the past 6 months.

Is anyone facing this issue?

Yes! I pulled my hair out for 2 days thinking it was a code change on my side that broke the build. Interestingly, only certain function builds failed, not all. After the fix build times dropped dramatically.

THE FIX:

  • Upgraded to puppeteer "puppeteer": "^22.4.1" (from 20.7.1 to latest)
  • Changed my package.json build script to use node node_modules/puppeteer/install.mjs (.mjs instead of .js)

Complete build script in package.json:

"download:puppeteer:browser": "node node_modules/puppeteer/install.mjs",
"gcp-build": "npm run download:puppeteer:browser",
"postinstall": "npm run gcp-build",

@jonvickers Want to correct something. I think you need to set:

"gcp-build": "node node_modules/puppeteer/install.js"

gcp-build is a custom build step that Google runs when building the docker container for your function, and you want it to download Chromium. I didn’t suggest that before, because I was getting seemingly indeterministic behavior (i.e., because I wasn’t understanding what was happening) that sometimes puppeteer would fail to launch, and sometimes it would. Then redeploying would sometimes fix it. I’m not sure but I think what might be happening is that when Google deploys your function, it checks if package-lock.json has changed. If it has, then it calls “npm ci” which will install puppeteer if updated. When installing it, puppeteer’s postinstall script automatically calls "node node_modules/puppeteer/install.js" to download Chrome. If it has’t changed, then it doesn’t get called and somehow the prior cached version of Chrome is gone in the new instance. But if you set gcp-build then it will always download Chrome, and you’ll never be missing it. This is my theory now. It could be something a bit different, but in any event setting that custom build step seems to produce a better outcome.

I found one more relevant piece of information. Following the steps above I got the error error while loading shared libraries: libnss3.so: cannot open shared object file: No such file or directory. It turns out this lib is available for Cloud Functions on the Ubuntu 18 image (https://cloud.google.com/functions/docs/reference/system-packages) and not on Ubuntu 22. The Ubuntu 22 image is used by the nodejs18 runtime, so there is no libnss3 for puppeteer to use. Moving to nodejs16 made everything work. https://cloud.google.com/functions/docs/concepts/execution-environment hth

Hi, I also have similar issues and none of the above works.

Error message. Error: Could not find Chromium (rev. 1056772). This can occur if either

  1. you did not perform an installation before running the script (e.g. npm install) or
  2. your cache path is incorrectly configured (which is: workspace/.cache/puppeteer). For (2), check out our guide on configuring puppeteer at https://pptr.dev/guides/configuration.

When I’m trying solution above it says that the package is too big and I’m not allowed to deploy it on the Cloud.

Add "gcp-build": "node node_modules/puppeteer/install.mjs" to the scripts and make sure that you have a .puppeteerrc.cjs at the root directory which includes the following

const {join} = require('path');
module.exports = {
  cacheDirectory: join(__dirname, '.cache', 'puppeteer')
};

Worked with me fine with "puppeteer": "^21.4.1" gcloud functions node18

@jrandolf I’ve created a Google Cloud Function based on your sample: https://github.com/vetler/test-puppeteer

To deploy this, I’ve used Google Cloud Build, which has previously worked fine. Building now fails, see log file here. With Puppeteer 18.2.1 it builds and also runs as a cloud function successfully. See build log here.

Yea, me neither 😦

@jrandolf I think the problem is PUPPETEER_TMP_DIR is not taking configuration file into account, so even you setup your configuration file, when npm install is performed the browser is fetched in PUPPETEER_TMP_DIR but during the execution time the code is trying to load the browser from cacheDirectory .

@jrandolf actually I solved the problem by manually installing puppeteer-core 18.2.1. It didn’t get downgraded automatically with the puppeteer downgrade.

Since yesterday, the firebase cloud functions deployment is taking too long while using puppeteer, resulting in a timeout. When I remove puppeteer from my package.json, it succeeds. Note: the same package was deployed successfully in the past 6 months. Is anyone facing this issue?

Yes! I pulled my hair out for 2 days thinking it was a code change on my side that broke the build. Interestingly, only certain function builds failed, not all. After the fix build times dropped dramatically.

THE FIX:

  • Upgraded to puppeteer "puppeteer": "^22.4.1" (from 20.7.1 to latest)
  • Changed my package.json build script to use node node_modules/puppeteer/install.mjs (.mjs instead of .js)

Complete build script in package.json:

"download:puppeteer:browser": "node node_modules/puppeteer/install.mjs",
"gcp-build": "npm run download:puppeteer:browser",
"postinstall": "npm run gcp-build",

Thanks my hero @vojdan I was able to fix based on your instructions.

For my case:

  • Upgraded to "puppeteer": "^22.5.0" (from 20.7.3 to latest)
  • Change my package.son build script to use node node_modules/puppeteer/install.mjs (.mjs instead of .js)

Can’t explain why, but for the three scripts you mentioned, the only one above worked for me:

"gcp-build": "node node_modules/puppeteer/install.mjs"

It may be because gcp-build is run by default as post install.

Since yesterday, the firebase cloud functions deployment is taking too long while using puppeteer, resulting in a timeout. When I remove puppeteer from my package.json, it succeeds.

Note: the same package was deployed successfully in the past 6 months.

Is anyone facing this issue?

I was facing this issue when deploying in the cloud, this seems related: https://github.com/puppeteer/puppeteer/issues/12094

@jonvickers Want to correct something. I think you need to set:

"gcp-build": "node node_modules/puppeteer/install.js"

gcp-build is a custom build step that Google runs when building the docker container for your function, and you want it to download Chromium. I didn’t suggest that before, because I was getting seemingly indeterministic behavior (i.e., because I wasn’t understanding what was happening) that sometimes puppeteer would fail to launch, and sometimes it would. Then redeploying would sometimes fix it. I’m not sure but I think what might be happening is that when Google deploys your function, it checks if package-lock.json has changed. If it has, then it calls “npm ci” which will install puppeteer if updated. When installing it, puppeteer’s postinstall script automatically calls "node node_modules/puppeteer/install.js" to download Chrome. If it has’t changed, then it doesn’t get called and somehow the prior cached version of Chrome is gone in the new instance. But if you set gcp-build then it will always download Chrome, and you’ll never be missing it. This is my theory now. It could be something a bit different, but in any event setting that custom build step seems to produce a better outcome.

I needed to use the build command if I run { headless: ‘new’ }. Calling the launch() without params worked even with the empty (“”) gcp-build.

Is it possible to implement something similar with firebase functions or must we still downgrade puppeteer there?

@jonvickers Want to correct something. I think you need to set:

"gcp-build": "node node_modules/puppeteer/install.js"

gcp-build is a custom build step that Google runs when building the docker container for your function, and you want it to download Chromium. I didn’t suggest that before, because I was getting seemingly indeterministic behavior (i.e., because I wasn’t understanding what was happening) that sometimes puppeteer would fail to launch, and sometimes it would. Then redeploying would sometimes fix it. I’m not sure but I think what might be happening is that when Google deploys your function, it checks if package-lock.json has changed. If it has, then it calls “npm ci” which will install puppeteer if updated. When installing it, puppeteer’s postinstall script automatically calls "node node_modules/puppeteer/install.js" to download Chrome. If it has’t changed, then it doesn’t get called and somehow the prior cached version of Chrome is gone in the new instance. But if you set gcp-build then it will always download Chrome, and you’ll never be missing it. This is my theory now. It could be something a bit different, but in any event setting that custom build step seems to produce a better outcome.

I needed to use the build command if I run { headless: ‘new’ }. Calling the launch() without params worked even with the empty (“”) gcp-build.

My last explanation seems to be correct, but to elaborate based on examining Cloud Build logs of the function deployment, it seems that the following happens: It seems that the .cache directory containing the Chromium download goes into its own docker layer, and node_modules into a different layer. If Cloud Build detects that npm ci needs to run, then it will install puppeteer into node_modules, and call its postinstall script. Both layers then get generated, and used in the new deployment. But if npm ci does not need to run, then it uses the old node_modules layer, and thinks that the cache layer is no longer necessary and drops it. But if you specify gcp-build, then it will always generate the cache layer and gets put into the instance.

I just call await puppeteer.launch() without any options, and it works fine.

Yes, package.json scripts section contains "gcp-build": "". When deploying earlier this week, something (I forgot what) wasn’t working, and adding that fixed it.

Puppeteer version 19 works fine on GCF Node 16, as long as you add a .puppeteerrc.cjs file in the same directory as package.json, with these contents:

const {join} = require('path')

/**
 * @type {import("puppeteer").PuppeteerConfiguration}
 */
module.exports = {
  cacheDirectory: join(__dirname, '.cache', 'puppeteer'),
}

GCF Node 18 does not work with any Puppeteer version, but that’s a GCF problem as discussed here: https://issuetracker.google.com/issues/266279679

f I deploy the same setup but rename the function, the cache misses, everything is installed again and everything works. I ended up changing the .puppeteerrc.cjs to point the cacheDirectory to cacheDirectory: join(__dirname, 'node_modules', '.cache', 'puppeteer'),. Now I get to update functions and puppeteer still works.

What function are you renaming?

I renamed the cloud function so I could test if the cloud functions deployment did in fact reuse a cached node_modules folder (which it does). So same code, different cloud function name in the gcloud deploy command.

ok, works, I had to add to firebase.json: “ignore”: [“.cache”] as below:

“functions”: [ { “source”: “functions”, “codebase”: “default”, “ignore”: [ “node_modules”, “.git”, “firebase-debug.log”, “firebase-debug.*.log”, “.cache” ] } ],

Done! See log file here.

Built and deployed successfully. Calling the function produces no error in the Google Cloud Function logs, will try out this solution on a real project and report back. Thanks!

This issue is also happening on vercel builds.

Thanks for your reply.

I’m not sure I understand. I’m not using the puppeteer-provided Docker image.

When running in the Cloud Functions environment, there is no control over the Docker build system and resulting image. There is a high probability the build process is not as linear as puppeteer’s Dockerfile, meaning the entire WORKDIR / HOME might not make it to the final image.

As far as I know, the only guarantee we have is that dependencies are installed, hence the node_modules folder is available, along with the source code of the application being built. That’s it. The previous caching location made it convenient to use in this setup (i.e. it worked out of the box). Also, the benefits of the new location (“better, cross-project installation of browsers”) are not needed when containerising the application.

Also, while looking for a solution in the documentation, I stumbled upon this information about Cloud Functions, which seems to be outdated.