puppeteer: [Bug]: Monorepo breaks build for Cloud Functions
Bug description
Steps to reproduce the problem:
- Depend on puppeteer
19.0.0
in a Cloud Function. - Deploy the Cloud Function, the build succeeds.
- Invoke the Cloud Function, the error below occurs.
My guess is that the new cache location is not retained in the final Docker image for the Cloud Function. Although the build (and probably the download of Chromium) succeeds, Chromium is not available at runtime.
While the Cloud Function environment is not a direct concern of the puppeteer project, the caching feature introduced in 19.0.0
makes it hard to reliably install Chromium in an environment where we don’t control the build/file system. I’ve tried playing around with PUPPETEER_CACHE_DIR
, but again, we have very little guarantees over the build flow and it potentially changing.
Instead of specifying a download location using PUPPETEER_CACHE_DIR
, is there any way the old behaviour can optionally be turned on, and let puppeteer
figure out where it’s installed? This would avoid having to know the subtlety of the underlying build/file system.
Thanks,
Puppeteer version
19.0.0
Node.js version
16
npm version
8.5.0
What operating system are you seeing the problem on?
Linux
Relevant log output
Error: Could not find expected browser (chrome) locally. Run `npm install` to download the correct Chromium revision (1045629).
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Reactions: 9
- Comments: 82 (11 by maintainers)
Links to this issue
Commits related to this issue
- feat: use configuration files (#9140) This PR adds configurations files to `puppeteer`'s methods for configuration. Under the hood, `puppeteer` relies on https://www.npmjs.com/package/cosmiconfig w... — committed to puppeteer/puppeteer by jrandolf 2 years ago
- Fix Puppeteer version. * Switch to using `nodejs16`. * Switch to Puppeteer 18.2.1. Context: * https://github.com/puppeteer/puppeteer/issues/9128 * https://b.corp.google.com/issues/266279679 Test: T... — committed to androidx/androidx by tikurahul a year ago
This is fixed. Just add
.puppeteerrc.cjs
at the root of your app directory withFrom my short testing the following configuration (based on the suggestions from this thread) seems to work with Firebase Functions (v1) and allows me to use
Puppeteer.launch({headless: "new"})
. So far I didn’t observe any issues with the cache.firebase.json
:package.json
:.puppeteerrc.cjs
:Sorry for necromancing this thread but what an awful experience this is. I am also struggling to get Puppeteer v19 working with Google Cloud Functions. How is downgrading to v18 an acceptable solution? Whatever change happened from v19 to v18 all of a sudden it requires extra configuration files and still nothing works. This is the worst user experience. Like seriously normally libraries which mature should become easier to use and not worse. This is diabolical. Did anyone find a reliable solution to make puppeteer work with google cloud functions or is everyone downgrading to v18 now?
Wanted to chime in and mention this effected me when I upgraded from v16 to v19 while deploying puppeteer to a Cloud Function.
If a solution could be implemented that didn’t involve having to create an extra configuration file, that’d be awesome. I’ve downgraded to v18 for now.
Yes! I pulled my hair out for 2 days thinking it was a code change on my side that broke the build. Interestingly, only certain function builds failed, not all. After the fix build times dropped dramatically.
THE FIX:
"puppeteer": "^22.4.1"
(from 20.7.1 to latest)package.json
build script to usenode node_modules/puppeteer/install.mjs
(.mjs
instead of.js
)Complete build script in
package.json
:@jonvickers Want to correct something. I think you need to set:
gcp-build
is a custom build step that Google runs when building the docker container for your function, and you want it to download Chromium. I didn’t suggest that before, because I was getting seemingly indeterministic behavior (i.e., because I wasn’t understanding what was happening) that sometimes puppeteer would fail to launch, and sometimes it would. Then redeploying would sometimes fix it. I’m not sure but I think what might be happening is that when Google deploys your function, it checks if package-lock.json has changed. If it has, then it calls “npm ci” which will install puppeteer if updated. When installing it, puppeteer’s postinstall script automatically calls"node node_modules/puppeteer/install.js"
to download Chrome. If it has’t changed, then it doesn’t get called and somehow the prior cached version of Chrome is gone in the new instance. But if you setgcp-build
then it will always download Chrome, and you’ll never be missing it. This is my theory now. It could be something a bit different, but in any event setting that custom build step seems to produce a better outcome.I found one more relevant piece of information. Following the steps above I got the error
error while loading shared libraries: libnss3.so: cannot open shared object file: No such file or directory
. It turns out this lib is available for Cloud Functions on the Ubuntu 18 image (https://cloud.google.com/functions/docs/reference/system-packages) and not on Ubuntu 22. The Ubuntu 22 image is used by thenodejs18
runtime, so there is no libnss3 for puppeteer to use. Moving tonodejs16
made everything work. https://cloud.google.com/functions/docs/concepts/execution-environment hthHi, I also have similar issues and none of the above works.
Error message. Error: Could not find Chromium (rev. 1056772). This can occur if either
npm install
) orWhen I’m trying solution above it says that the package is too big and I’m not allowed to deploy it on the Cloud.
Add
"gcp-build": "node node_modules/puppeteer/install.mjs"
to the scripts and make sure that you have a .puppeteerrc.cjs at the root directory which includes the followingWorked with me fine with
"puppeteer": "^21.4.1"
gcloud functions node18@jrandolf I’ve created a Google Cloud Function based on your sample: https://github.com/vetler/test-puppeteer
To deploy this, I’ve used Google Cloud Build, which has previously worked fine. Building now fails, see log file here. With Puppeteer 18.2.1 it builds and also runs as a cloud function successfully. See build log here.
Yea, me neither 😦
@jrandolf I think the problem is
PUPPETEER_TMP_DIR
is not taking configuration file into account, so even you setup your configuration file, whennpm install
is performed the browser is fetched inPUPPETEER_TMP_DIR
but during the execution time the code is trying to load the browser fromcacheDirectory
.@jrandolf actually I solved the problem by manually installing puppeteer-core 18.2.1. It didn’t get downgraded automatically with the puppeteer downgrade.
Thanks my hero @vojdan I was able to fix based on your instructions.
For my case:
"puppeteer": "^22.5.0"
(from 20.7.3 to latest)node node_modules/puppeteer/install.mjs
(.mjs
instead of.js
)Can’t explain why, but for the three scripts you mentioned, the only one above worked for me:
"gcp-build": "node node_modules/puppeteer/install.mjs"
It may be because
gcp-build
is run by default as post install.I was facing this issue when deploying in the cloud, this seems related: https://github.com/puppeteer/puppeteer/issues/12094
Is it possible to implement something similar with firebase functions or must we still downgrade puppeteer there?
I needed to use the build command if I run { headless: ‘new’ }. Calling the launch() without params worked even with the empty (“”) gcp-build.
My last explanation seems to be correct, but to elaborate based on examining Cloud Build logs of the function deployment, it seems that the following happens: It seems that the
.cache
directory containing the Chromium download goes into its own docker layer, andnode_modules
into a different layer. If Cloud Build detects thatnpm ci
needs to run, then it will install puppeteer into node_modules, and call its postinstall script. Both layers then get generated, and used in the new deployment. But ifnpm ci
does not need to run, then it uses the old node_modules layer, and thinks that the cache layer is no longer necessary and drops it. But if you specifygcp-build
, then it will always generate the cache layer and gets put into the instance.I just call
await puppeteer.launch()
without any options, and it works fine.Yes, package.json
scripts
section contains"gcp-build": ""
. When deploying earlier this week, something (I forgot what) wasn’t working, and adding that fixed it.Puppeteer version 19 works fine on GCF Node 16, as long as you add a .puppeteerrc.cjs file in the same directory as package.json, with these contents:
GCF Node 18 does not work with any Puppeteer version, but that’s a GCF problem as discussed here: https://issuetracker.google.com/issues/266279679
I renamed the cloud function so I could test if the cloud functions deployment did in fact reuse a cached node_modules folder (which it does). So same code, different cloud function name in the gcloud deploy command.
ok, works, I had to add to firebase.json: “ignore”: [“.cache”] as below:
“functions”: [ { “source”: “functions”, “codebase”: “default”, “ignore”: [ “node_modules”, “.git”, “firebase-debug.log”, “firebase-debug.*.log”, “.cache” ] } ],
Done! See log file here.
Built and deployed successfully. Calling the function produces no error in the Google Cloud Function logs, will try out this solution on a real project and report back. Thanks!
This issue is also happening on vercel builds.
Thanks for your reply.
I’m not sure I understand. I’m not using the puppeteer-provided Docker image.
When running in the Cloud Functions environment, there is no control over the Docker build system and resulting image. There is a high probability the build process is not as linear as puppeteer’s Dockerfile, meaning the entire
WORKDIR
/HOME
might not make it to the final image.As far as I know, the only guarantee we have is that dependencies are installed, hence the
node_modules
folder is available, along with the source code of the application being built. That’s it. The previous caching location made it convenient to use in this setup (i.e. it worked out of the box). Also, the benefits of the new location (“better, cross-project installation of browsers”) are not needed when containerising the application.Also, while looking for a solution in the documentation, I stumbled upon this information about Cloud Functions, which seems to be outdated.