puppeteer: [Bug]: new-headless mode downloads files

Bug expectation

In old-headless mode, if you page.goto() a binary file (e.g. docx/xlsx/etc) then the navigation is aborted and that’s the end of it. In new-headless mode then it claims to abort it, but a visible download window appears (i.e. it fails to be “headless”) and the file is downloaded to the Downloads folder.

Bug behavior

  • Flaky
  • PDF

Minimal, reproducible example

import puppeteer from 'puppeteer'

const browser = await puppeteer.launch({ headless: 'new' })
const context = await browser.createIncognitoBrowserContext()
const page = await context.newPage()
try {
  await page.goto('https://unequivocal.eu/dl/example.docx')
} catch (err) {
  console.log(err)
  // need a delay because otherwise the browser exits before the download completes
  await new Promise(resolve => setTimeout(resolve, 5000))
}
await browser.close()

Error string

no error

Puppeteer configuration

No response

Puppeteer version

20.1.2

Node version

16.14.0

Package manager

npm

Package manager version

9.4.0

Operating system

Windows

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 15

Most upvoted comments

I have been unable to get any variation of setDownloadBehavior to work in newer puppeteer. Previously working code using Page.setDownloadBehavior fails, as does goto method above. Based on https://github.com/puppeteer/puppeteer/issues/3722#issuecomment-679509973 I have a working function that downloads a url to a local file by using fetch in the page context and marshaling the results to the node context.

page.downloadFile = async function(url, destinationPath) {
  var binaryString =  await page.evaluate(async (url) =>
  {
      var response = await fetch(url, { method: 'GET', credentials: 'include'});
      var blob = await response.blob();
      var result = new Promise((resolve, reject) => {
        var reader = new FileReader();          
        reader.onloadend = () => resolve(reader.result);
        reader.onerror = reject;
        reader.readAsBinaryString(blob);
      });
      return result;
  },url);
  var fileBlob =  Buffer.from(binaryString , 'binary');
  await fsPromises.writeFile(destinationPath, fileBlob);
};

Usage:

await page.downloadFile('https://unequivocal.eu/dl/example.docx','./some/path/example.docx')

The main downsides I see are:

  • translating to string then back to file can’t be ideal, but as best I can tell is only way to marshal from page context to node context. blob and arrayBuffer won’t.
  • you have to know the url. In some cases where the page generates a url on the fly on a button click, this may be difficult to obtain.
  • you have to know the file name. In some cases where the server generates the filename, this may be difficult.

But it works for my purposes and I hope it helps others.