puppeteer: [Bug]: HTTPRequests are lost/missing when using Puppeteer with setRequestInterception enabled
Bug description
Here’s the scoop. I’m trying to use Puppeteer v18.0.5 with the bundled chromium browser against a specific website. I’m using Node v16.16.0 However, when I enable request interception via page.setRequestInterception(true), all of the HTTPRequests for any image resources are lost. My handler is invoked far less while intercepting than when not intercepting. The page never fires any requests for images while intercepting. But when I disable the interception, the page loads normally. Yes, I know about invoking continue() on all requests. I’m currently doing that in the request handler on the page.
I’ve also poured over the Puppeteer issues pages and have found similar symptoms on some of the earlier Puppeteer versions, but they were all different issues that have all been resolved since those early versions. This seems unique.
I’ve looked through Puppeteer source code as well as CDP events to try and find any explanation, but have found none.
As an important note for anyone trying to reproduce this, you must be proxied through a server in the London general area in order to successfully load this site. It has geographic restrictions.
Here’s my code to reproduce:
const puppeteer = require(‘puppeteer’);
(async () => {
const options = {
browserWidth: 1366,
browserHeight: 983,
intercepting: false
};
const browser = await puppeteer.launch(
{
args: [`--window-size=${options.browserWidth},${options.browserHeight}`],
defaultViewport: {width: options.browserWidth, height: options.browserHeight},
headless: false
}
);
const page = (await browser.pages())[0];
page.on('request', async (request) => {
console.log(`Request: ${request.method()} | ${request.url()} | ${request.resourceType()} | ${request._requestId}`);
if (options.intercepting) await request.continue();
});
await page.setRequestInterception(options.intercepting);
await page.goto('https://vegas.williamhill.com', {waitUntil: 'networkidle2', timeout: 65000});
// To give a moment to view the page in headful mode before closing browser.
await new Promise(resolve => setTimeout(resolve, 5000));
await browser.close();
})();
Here’s what the page looks like with intercepting disabled: Expected Page Load
Here’s what the page looks like with intercepting enabled and continuing all requests. Page load while intercepting and continuing all requests
With request interception disabled my handler is invoked for 104 different requests. But with the interception enabled it’s only invoked 22 times. I’m not hitting a navigation timeout as the .goto() method returns before my timeout each time.
Puppeteer version
18.0.5
Node.js version
16.16.0
npm version
8.11.0
What operating system are you seeing the problem on?
macOS
Relevant log output
No response
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Reactions: 2
- Comments: 18 (5 by maintainers)
The website does not seem to be available anymore but I wonder if this could have been fixed? I see some related issues not being reproducible anymore with v20.2.1. Is there an up-to-date reproducible example fro this issue?
I haven’t done the rollback testing yet. I’ve been a bit busy. But we did do a test where we bypassed the puppeteer interception and went straight to the CDP session provided by the target. We sent Fetch.enable, and registered a handler for the Fetch.requestPaused event. We are then continuing the requests by calling Fetch.continueRequest. When we did this, the page appeared to function correctly. We registered the same number of requests while intercepting compared to when not intercepting.
So, long story short, I’ll do the testing to see if it works in previous Puppeteer versions and get back to you.
Is it possible that interruption are leading to scripts loading in a bad order?