puppeteer: Page.goto returns null for some urls
This issue extracts relevant informations from #1056 which was not very clear.
When used will request inception, puppeteer returns a null
value sometimes. But the documentation says:
NOTE
page.goto
either throw or return a main resource response. The only exception is navigation toabout:blank
, which would succeed and returnnull
.
Steps to reproduce
Tell us about your environment:
- Puppeteer version: 99103cbb97e6fe80de86c3001195355c3d8f51e0
- Platform / OS version: debian stretch
- URLs (if applicable):
- http://giffysk8s.blogspot.com/
- http://lemaninignesinden.blogspot.com/2015/02/vitaminenzimklorofilsifa.html
- http://henke-s.blogspot.com/
- http://www.cometogetherkids.com/2011/03/fleece-flower-petal-pillows.html
- http://www.wildstar-online.com/uk/drops/1/strain/
- https://www.vanityfair.com/culture/photos/2010/06/world-cup-portfolio-201006
- http://www.sportsnet.ca/baseball/mlb/blue-jays-exploring-trade-options-before-free-agency/
- http://www.fc-weisweil.de/
What steps will reproduce the problem?
- save the code bellow as
test.js
- run
nodejs test.js "http://giffysk8s.blogspot.com/"
- check for a line with
result: null
in the output. It is probably the last line. - comment the line 40
blockImages(page);
- run
nodejs test.js "http://giffysk8s.blogspot.com/"
- enjoy the correct result object
test.js
:
'use strict';
const puppeteer = require('puppeteer');
const URL = process.argv[2];
async function blockImages(page) {
await page.setRequestInterception(true);
page.on("request", (request) => {
if (request.resourceType === "image") {
request.abort();
} else {
request.continue();
}
});
}
(async () => {
console.log("processing url", URL);
process.on("uncaughtException", (e) => {
console.error("Unhandled exeption:", e);
process.exit(1);
});
process.on("unhandledRejection", (reason, p) => {
console.error("Unhandled Rejection at: Promise", p, "reason:", reason);
process.exit(2);
});
const args = [
"--disable-setuid-sandbox",
"--no-sandbox",
];
const options = {
args,
headless: true,
ignoreHTTPSErrors: true,
dumpio: true,
};
const browser = await puppeteer.launch(options);
const page = await browser.newPage();
blockImages(page);
const res = await page.goto(URL, { timeout: 30000, waitUntil: "load" });
console.log("result:", res);
await page.close();
await browser.close();
})();
What is the expected result?
A normal result object should be printed.
$ nodejs test.js "http://giffysk8s.blogspot.com/"
processing url http://giffysk8s.blogspot.com/
[1115/065856.610648:ERROR:gpu_process_transport_factory.cc(1009)] Lost UI shared context.
[1115/065856.637350:ERROR:instance.cc(49)] Unable to locate service manifest for metrics
[1115/065856.637370:ERROR:service_manager.cc(889)] Failed to resolve service name: metrics
DevTools listening on ws://127.0.0.1:42869/devtools/browser/753f024d-a38a-409e-95aa-64165a012e80
[1115/065856.752637:ERROR:nss_util.cc(724)] After loading Root Certs, loaded==false: NSS error code: -8018
[1115/065858.367284:INFO:CONSOLE(26)] "Mixed Content: The page at 'https://www.blogger.com/navbar.g?targetBlogID=4652634595166455172&blogName=Canvasses+of+Poetry+and+Prose&publishMode=PUBLISH_MODE_BLOGSPOT&navbarType=BLUE&layoutType=LAYOUTS&searchRoot=http://giffysk8s.blogspot.com/search&blogLocale=en&v=2&homepageUrl=http://giffysk8s.blogspot.com/&vt=-6074931036815180963&usegapi=1&jsh=m%3B%2F_%2Fscs%2Fapps-static%2F_%2Fjs%2Fk%3Doz.gapi.en_GB.eAe10hJSHzc.O%2Fm%3D__features__%2Fam%3DAQ%2Frt%3Dj%2Fd%3D1%2Frs%3DAGLTcCOSrQAixLCyS0W7RP8OLBQKClcz2w#id=navbar-iframe&_gfid=navbar-iframe&parent=http%3A%2F%2Fgiffysk8s.blogspot.sg&pfname=&rpctoken=34501815' was loaded over a secure connection, but contains a form that targets an insecure endpoint 'http://giffysk8s.blogspot.com/search'. This endpoint should be made available over a secure connection.", source: https://www.blogger.com/navbar.g?targetBlogID=4652634595166455172&blogName=Canvasses+of+Poetry+and+Prose&publishMode=PUBLISH_MODE_BLOGSPOT&navbarType=BLUE&layoutType=LAYOUTS&searchRoot=http://giffysk8s.blogspot.com/search&blogLocale=en&v=2&homepageUrl=http://giffysk8s.blogspot.com/&vt=-6074931036815180963&usegapi=1&jsh=m%3B%2F_%2Fscs%2Fapps-static%2F_%2Fjs%2Fk%3Doz.gapi.en_GB.eAe10hJSHzc.O%2Fm%3D__features__%2Fam%3DAQ%2Frt%3Dj%2Fd%3D1%2Frs%3DAGLTcCOSrQAixLCyS0W7RP8OLBQKClcz2w#id=navbar-iframe&_gfid=navbar-iframe&parent=http%3A%2F%2Fgiffysk8s.blogspot.sg&pfname=&rpctoken=34501815 (26)
[1115/065858.447190:WARNING:render_frame_host_impl.cc(2679)] OnDidStopLoading was called twice.
[1115/065858.447639:WARNING:render_frame_host_impl.cc(2679)] OnDidStopLoading was called twice.
[1115/065858.871226:WARNING:render_frame_host_impl.cc(2679)] OnDidStopLoading was called twice.
result: Response {
_client:
Session {
domain: null,
_events:
{ 'Page.frameAttached': [Function],
'Page.frameNavigated': [Function],
'Page.frameDetached': [Function],
'Runtime.executionContextCreated': [Function],
'Page.lifecycleEvent': [Function],
'Network.requestWillBeSent': [Function: bound _onRequestWillBeSent],
'Network.requestIntercepted': [Function: bound _onRequestIntercepted],
'Network.responseReceived': [Function: bound _onResponseReceived],
'Network.loadingFinished': [Function: bound _onLoadingFinished],
'Network.loadingFailed': [Function: bound _onLoadingFailed],
'Page.loadEventFired': [Function],
'Runtime.consoleAPICalled': [Function],
'Page.javascriptDialogOpening': [Function],
'Runtime.exceptionThrown': [Function],
'Security.certificateError': [Function],
'Inspector.targetCrashed': [Function],
'Performance.metrics': [Function] },
_eventsCount: 17,
_maxListeners: undefined,
_lastId: 11,
_callbacks: Map {},
_connection:
Connection {
domain: null,
_events: [Object],
_eventsCount: 3,
_maxListeners: undefined,
_url: 'ws://127.0.0.1:42869/devtools/browser/753f024d-a38a-409e-95aa-64165a012e80',
_lastId: 14,
_callbacks: Map {},
_delay: 0,
_ws: [Object],
_sessions: [Object],
_closeCallback: [Function] },
_targetId: '(63A63DF21210058BAD5DE73B52FAD0A8)',
_sessionId: '(63A63DF21210058BAD5DE73B52FAD0A8):1' },
_request:
Request {
_client:
Session {
domain: null,
_events: [Object],
_eventsCount: 17,
_maxListeners: undefined,
_lastId: 11,
_callbacks: Map {},
_connection: [Object],
_targetId: '(63A63DF21210058BAD5DE73B52FAD0A8)',
_sessionId: '(63A63DF21210058BAD5DE73B52FAD0A8):1' },
_requestId: '11529.1',
_interceptionId: null,
_allowInterception: false,
_interceptionHandled: false,
_response: [Circular],
_failureText: null,
_completePromiseFulfill: [Function],
_completePromise: Promise { undefined },
url: 'http://giffysk8s.blogspot.sg/',
resourceType: 'document',
method: 'GET',
postData: undefined,
headers:
{ 'user-agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/64.0.3264.0 Safari/537.36' } },
_contentPromise: null,
status: 200,
ok: true,
url: 'http://giffysk8s.blogspot.sg/',
headers:
{ date: 'Wed, 15 Nov 2017 06:58:57 GMT',
'content-encoding': 'gzip',
'x-content-type-options': 'nosniff',
'last-modified': 'Thu, 08 Sep 2016 04:31:00 GMT',
server: 'GSE',
etag: 'W/"30bf6632614ccc69849f4627193b6855b9928f434964bcc926be27de576b7652"',
'content-type': 'text/html; charset=UTF-8',
'cache-control': 'private, max-age=0',
'content-length': '19943',
'x-xss-protection': '1; mode=block',
expires: 'Wed, 15 Nov 2017 06:58:57 GMT' } }
What happens instead?
$ nodejs test.js "http://giffysk8s.blogspot.com/"
processing url http://giffysk8s.blogspot.com/
[1115/065217.501797:ERROR:gpu_process_transport_factory.cc(1009)] Lost UI shared context.
[1115/065217.532286:ERROR:instance.cc(49)] Unable to locate service manifest for metrics
[1115/065217.532305:ERROR:service_manager.cc(889)] Failed to resolve service name: metrics
DevTools listening on ws://127.0.0.1:40731/devtools/browser/c7ec5251-d794-49a1-8fd9-f26c94aa9c39
[1115/065217.652862:ERROR:nss_util.cc(724)] After loading Root Certs, loaded==false: NSS error code: -8018
[1115/065220.088205:WARNING:render_frame_host_impl.cc(2679)] OnDidStopLoading was called twice.
[1115/065220.088888:WARNING:render_frame_host_impl.cc(2679)] OnDidStopLoading was called twice.
[1115/065220.275850:INFO:CONSOLE(26)] "Mixed Content: The page at 'https://www.blogger.com/navbar.g?targetBlogID=4652634595166455172&blogName=Canvasses+of+Poetry+and+Prose&publishMode=PUBLISH_MODE_BLOGSPOT&navbarType=BLUE&layoutType=LAYOUTS&searchRoot=http://giffysk8s.blogspot.com/search&blogLocale=en&v=2&homepageUrl=http://giffysk8s.blogspot.com/&vt=-6074931036815180963&usegapi=1&jsh=m%3B%2F_%2Fscs%2Fapps-static%2F_%2Fjs%2Fk%3Doz.gapi.en_GB.eAe10hJSHzc.O%2Fm%3D__features__%2Fam%3DAQ%2Frt%3Dj%2Fd%3D1%2Frs%3DAGLTcCOSrQAixLCyS0W7RP8OLBQKClcz2w#id=navbar-iframe&_gfid=navbar-iframe&parent=http%3A%2F%2Fgiffysk8s.blogspot.sg&pfname=&rpctoken=18836155' was loaded over a secure connection, but contains a form that targets an insecure endpoint 'http://giffysk8s.blogspot.com/search'. This endpoint should be made available over a secure connection.", source: https://www.blogger.com/navbar.g?targetBlogID=4652634595166455172&blogName=Canvasses+of+Poetry+and+Prose&publishMode=PUBLISH_MODE_BLOGSPOT&navbarType=BLUE&layoutType=LAYOUTS&searchRoot=http://giffysk8s.blogspot.com/search&blogLocale=en&v=2&homepageUrl=http://giffysk8s.blogspot.com/&vt=-6074931036815180963&usegapi=1&jsh=m%3B%2F_%2Fscs%2Fapps-static%2F_%2Fjs%2Fk%3Doz.gapi.en_GB.eAe10hJSHzc.O%2Fm%3D__features__%2Fam%3DAQ%2Frt%3Dj%2Fd%3D1%2Frs%3DAGLTcCOSrQAixLCyS0W7RP8OLBQKClcz2w#id=navbar-iframe&_gfid=navbar-iframe&parent=http%3A%2F%2Fgiffysk8s.blogspot.sg&pfname=&rpctoken=18836155 (26)
[1115/065220.546524:WARNING:render_frame_host_impl.cc(2679)] OnDidStopLoading was called twice.
result: null
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Comments: 19 (3 by maintainers)
Commits related to this issue
- fix: page.goto should resolve to response for self request (#1391) — committed to yujiosaka/puppeteer by deleted user 6 years ago
- fix: page.goto should support pages with self requests (#1391) (#1781) This patch fixes `page.goto` for websites that re-request document URL with javascript. Fixes #1391. — committed to puppeteer/puppeteer by deleted user 6 years ago
- fix: page.goto should support pages with self requests (#1391) (#1781) This patch fixes `page.goto` for websites that re-request document URL with javascript. Fixes #1391. — committed to WiserSolutions/puppeteer by deleted user 6 years ago
@ntzm
please try this code instead
Update
So, after through investigation, i can repro with the minimum test, which is https://github.com/yujiosaka/puppeteer/pull/2/files#diff-b1dc310bde327a785ea47c1b5b93a6e2R1
My mistake is that it was not caused by the redirection. It was a blogpost thing (the listed urls are blog pages generated by blogpost), and i had nothing to do with redirect.
Rather, it was caused by the service called feedjit. This service seems to add an iframe, which request its parent frame’s URL again.
So, it resulted in requesting the main frame’s url multiple times. But anyway, the fix should be same as https://github.com/yujiosaka/puppeteer/pull/2
I will make a PR soon.
Here is a POC patch: https://github.com/yujiosaka/puppeteer/pull/2
I can’t say I tested 100% of the occurrences but when I did print the current url of the page after goto returned
null
, it wasabout:blank
instead of the url passed togoto
.Maybe a race condition of some kind… maybe fixed by 44d1e834a4525e3bb546988aa312d0d7cb6d1c4a ?