puppeteer: [Bug]: page.pdf produces corrupt pdf

Bug description

Steps to reproduce the problem:

Occasionally, we find that our PDFs are not openable by any programs. We’ve narrowed the issue down to inclusion of certain images. When these images are present, the pdf created by puppeteer is corrupt. All image tools do not indicate that anything is wrong with the image itself so I believe this is an issue on the puppeteer side.

  1. Create an test.html file with the following contents
<html>
  <head>
  </head>
  <body>
    <img src="image.jpg">
  </body>
</html>

In the same directory, place the attached image.jpg

  1. Create a save_to_pdf.js file with the following contents
const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch({ headless: true });
  const page = await browser.newPage();
  await page.goto(`file://${__dirname}/test.html`, { waitUntil: 'networkidle0', timeout: 60000 });
  await page.pdf({
    path: 'out.pdf',
    printBackground: true
  });

  await browser.close();
})();
  1. Run node save_to_pdf.js
  2. Try to open out.pdf in any program. Screen Shot 2021-11-09 at 10 22 02 AM

puppeteer_bug.zip image.jpg

Puppeteer version

10.1.0

Node.js version

12.22.5

npm version

6.14.14

What operating system are you seeing the problem on?

macOS

Relevant log output

No response

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 18 (2 by maintainers)

Commits related to this issue

Most upvoted comments

So the Chromium fix landed in M100 and I plan to create new Puppeteer release early next week.

https://github.com/puppeteer/puppeteer/pull/7868 won’t actually fix the problem, it will just move the bug to the 10MB boundary which makes it less likely to happen (because fewer files are bigger than that). We can still land the fix until the Chromium fix arrives.