puppeteer: [Bug]: new-headless mode downloads files

Bug expectation

In old-headless mode, if you page.goto() a binary file (e.g. docx/xlsx/etc) then the navigation is aborted and that’s the end of it. In new-headless mode then it claims to abort it, but a visible download window appears (i.e. it fails to be “headless”) and the file is downloaded to the Downloads folder.

Bug behavior

Flaky
PDF

Minimal, reproducible example

import puppeteer from 'puppeteer'

const browser = await puppeteer.launch({ headless: 'new' })
const context = await browser.createIncognitoBrowserContext()
const page = await context.newPage()
try {
  await page.goto('https://unequivocal.eu/dl/example.docx')
} catch (err) {
  console.log(err)
  // need a delay because otherwise the browser exits before the download completes
  await new Promise(resolve => setTimeout(resolve, 5000))
}
await browser.close()

Error string

no error

Puppeteer configuration

No response

Puppeteer version

20.1.2

Node version

16.14.0

Package manager

npm

Package manager version

9.4.0

Operating system

Windows

About this issue

Original URL
State: closed
Created a year ago
Comments: 15

Most upvoted comments

I have been unable to get any variation of setDownloadBehavior to work in newer puppeteer. Previously working code using Page.setDownloadBehavior fails, as does goto method above. Based on https://github.com/puppeteer/puppeteer/issues/3722#issuecomment-679509973 I have a working function that downloads a url to a local file by using fetch in the page context and marshaling the results to the node context.

page.downloadFile = async function(url, destinationPath) {
  var binaryString =  await page.evaluate(async (url) =>
  {
      var response = await fetch(url, { method: 'GET', credentials: 'include'});
      var blob = await response.blob();
      var result = new Promise((resolve, reject) => {
        var reader = new FileReader();          
        reader.onloadend = () => resolve(reader.result);
        reader.onerror = reject;
        reader.readAsBinaryString(blob);
      });
      return result;
  },url);
  var fileBlob =  Buffer.from(binaryString , 'binary');
  await fsPromises.writeFile(destinationPath, fileBlob);
};

Usage:

await page.downloadFile('https://unequivocal.eu/dl/example.docx','./some/path/example.docx')

The main downsides I see are:

translating to string then back to file can’t be ideal, but as best I can tell is only way to marshal from page context to node context. blob and arrayBuffer won’t.
you have to know the url. In some cases where the page generates a url on the fly on a button click, this may be difficult to obtain.
you have to know the file name. In some cases where the server generates the filename, this may be difficult.

But it works for my purposes and I hope it helps others.

danroot on Jul 29, 2023