puppeteer: Page.goto returns null for some urls

This issue extracts relevant informations from #1056 which was not very clear.

When used will request inception, puppeteer returns a null value sometimes. But the documentation says:

NOTE page.goto either throw or return a main resource response. The only exception is navigation to about:blank, which would succeed and return null.

Steps to reproduce

Tell us about your environment:

What steps will reproduce the problem?

  1. save the code bellow as test.js
  2. run nodejs test.js "http://giffysk8s.blogspot.com/"
  3. check for a line with result: null in the output. It is probably the last line.
  4. comment the line 40 blockImages(page);
  5. run nodejs test.js "http://giffysk8s.blogspot.com/"
  6. enjoy the correct result object

test.js:

'use strict';

const puppeteer = require('puppeteer');

const URL = process.argv[2];

async function blockImages(page) {
  await page.setRequestInterception(true);
  page.on("request", (request) => {
    if (request.resourceType === "image") {
      request.abort();
    } else {
      request.continue();
    }
  });
}

(async () => {
  console.log("processing url", URL);
  process.on("uncaughtException", (e) => {
    console.error("Unhandled exeption:", e);
    process.exit(1);
  });
  process.on("unhandledRejection", (reason, p) => {
    console.error("Unhandled Rejection at: Promise", p, "reason:", reason);
    process.exit(2);
  });
  const args = [
    "--disable-setuid-sandbox",
    "--no-sandbox",
  ];
  const options = {
    args,
    headless: true,
    ignoreHTTPSErrors: true,
    dumpio: true,
  };
  const browser = await puppeteer.launch(options);
  const page = await browser.newPage();
  blockImages(page);
  const res = await page.goto(URL, { timeout: 30000, waitUntil: "load" });
  console.log("result:", res);
  await page.close();
  await browser.close();
})();

What is the expected result?

A normal result object should be printed.

$ nodejs test.js "http://giffysk8s.blogspot.com/"
processing url http://giffysk8s.blogspot.com/
[1115/065856.610648:ERROR:gpu_process_transport_factory.cc(1009)] Lost UI shared context.
[1115/065856.637350:ERROR:instance.cc(49)] Unable to locate service manifest for metrics
[1115/065856.637370:ERROR:service_manager.cc(889)] Failed to resolve service name: metrics

DevTools listening on ws://127.0.0.1:42869/devtools/browser/753f024d-a38a-409e-95aa-64165a012e80
[1115/065856.752637:ERROR:nss_util.cc(724)] After loading Root Certs, loaded==false: NSS error code: -8018
[1115/065858.367284:INFO:CONSOLE(26)] "Mixed Content: The page at 'https://www.blogger.com/navbar.g?targetBlogID=4652634595166455172&blogName=Canvasses+of+Poetry+and+Prose&publishMode=PUBLISH_MODE_BLOGSPOT&navbarType=BLUE&layoutType=LAYOUTS&searchRoot=http://giffysk8s.blogspot.com/search&blogLocale=en&v=2&homepageUrl=http://giffysk8s.blogspot.com/&vt=-6074931036815180963&usegapi=1&jsh=m%3B%2F_%2Fscs%2Fapps-static%2F_%2Fjs%2Fk%3Doz.gapi.en_GB.eAe10hJSHzc.O%2Fm%3D__features__%2Fam%3DAQ%2Frt%3Dj%2Fd%3D1%2Frs%3DAGLTcCOSrQAixLCyS0W7RP8OLBQKClcz2w#id=navbar-iframe&_gfid=navbar-iframe&parent=http%3A%2F%2Fgiffysk8s.blogspot.sg&pfname=&rpctoken=34501815' was loaded over a secure connection, but contains a form that targets an insecure endpoint 'http://giffysk8s.blogspot.com/search'. This endpoint should be made available over a secure connection.", source: https://www.blogger.com/navbar.g?targetBlogID=4652634595166455172&blogName=Canvasses+of+Poetry+and+Prose&publishMode=PUBLISH_MODE_BLOGSPOT&navbarType=BLUE&layoutType=LAYOUTS&searchRoot=http://giffysk8s.blogspot.com/search&blogLocale=en&v=2&homepageUrl=http://giffysk8s.blogspot.com/&vt=-6074931036815180963&usegapi=1&jsh=m%3B%2F_%2Fscs%2Fapps-static%2F_%2Fjs%2Fk%3Doz.gapi.en_GB.eAe10hJSHzc.O%2Fm%3D__features__%2Fam%3DAQ%2Frt%3Dj%2Fd%3D1%2Frs%3DAGLTcCOSrQAixLCyS0W7RP8OLBQKClcz2w#id=navbar-iframe&_gfid=navbar-iframe&parent=http%3A%2F%2Fgiffysk8s.blogspot.sg&pfname=&rpctoken=34501815 (26)
[1115/065858.447190:WARNING:render_frame_host_impl.cc(2679)] OnDidStopLoading was called twice.
[1115/065858.447639:WARNING:render_frame_host_impl.cc(2679)] OnDidStopLoading was called twice.
[1115/065858.871226:WARNING:render_frame_host_impl.cc(2679)] OnDidStopLoading was called twice.
result: Response {
  _client: 
   Session {
     domain: null,
     _events: 
      { 'Page.frameAttached': [Function],
        'Page.frameNavigated': [Function],
        'Page.frameDetached': [Function],
        'Runtime.executionContextCreated': [Function],
        'Page.lifecycleEvent': [Function],
        'Network.requestWillBeSent': [Function: bound _onRequestWillBeSent],
        'Network.requestIntercepted': [Function: bound _onRequestIntercepted],
        'Network.responseReceived': [Function: bound _onResponseReceived],
        'Network.loadingFinished': [Function: bound _onLoadingFinished],
        'Network.loadingFailed': [Function: bound _onLoadingFailed],
        'Page.loadEventFired': [Function],
        'Runtime.consoleAPICalled': [Function],
        'Page.javascriptDialogOpening': [Function],
        'Runtime.exceptionThrown': [Function],
        'Security.certificateError': [Function],
        'Inspector.targetCrashed': [Function],
        'Performance.metrics': [Function] },
     _eventsCount: 17,
     _maxListeners: undefined,
     _lastId: 11,
     _callbacks: Map {},
     _connection: 
      Connection {
        domain: null,
        _events: [Object],
        _eventsCount: 3,
        _maxListeners: undefined,
        _url: 'ws://127.0.0.1:42869/devtools/browser/753f024d-a38a-409e-95aa-64165a012e80',
        _lastId: 14,
        _callbacks: Map {},
        _delay: 0,
        _ws: [Object],
        _sessions: [Object],
        _closeCallback: [Function] },
     _targetId: '(63A63DF21210058BAD5DE73B52FAD0A8)',
     _sessionId: '(63A63DF21210058BAD5DE73B52FAD0A8):1' },
  _request: 
   Request {
     _client: 
      Session {
        domain: null,
        _events: [Object],
        _eventsCount: 17,
        _maxListeners: undefined,
        _lastId: 11,
        _callbacks: Map {},
        _connection: [Object],
        _targetId: '(63A63DF21210058BAD5DE73B52FAD0A8)',
        _sessionId: '(63A63DF21210058BAD5DE73B52FAD0A8):1' },
     _requestId: '11529.1',
     _interceptionId: null,
     _allowInterception: false,
     _interceptionHandled: false,
     _response: [Circular],
     _failureText: null,
     _completePromiseFulfill: [Function],
     _completePromise: Promise { undefined },
     url: 'http://giffysk8s.blogspot.sg/',
     resourceType: 'document',
     method: 'GET',
     postData: undefined,
     headers: 
      { 'user-agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/64.0.3264.0 Safari/537.36' } },
  _contentPromise: null,
  status: 200,
  ok: true,
  url: 'http://giffysk8s.blogspot.sg/',
  headers: 
   { date: 'Wed, 15 Nov 2017 06:58:57 GMT',
     'content-encoding': 'gzip',
     'x-content-type-options': 'nosniff',
     'last-modified': 'Thu, 08 Sep 2016 04:31:00 GMT',
     server: 'GSE',
     etag: 'W/"30bf6632614ccc69849f4627193b6855b9928f434964bcc926be27de576b7652"',
     'content-type': 'text/html; charset=UTF-8',
     'cache-control': 'private, max-age=0',
     'content-length': '19943',
     'x-xss-protection': '1; mode=block',
     expires: 'Wed, 15 Nov 2017 06:58:57 GMT' } }

What happens instead?

$ nodejs test.js "http://giffysk8s.blogspot.com/"
processing url http://giffysk8s.blogspot.com/
[1115/065217.501797:ERROR:gpu_process_transport_factory.cc(1009)] Lost UI shared context.
[1115/065217.532286:ERROR:instance.cc(49)] Unable to locate service manifest for metrics
[1115/065217.532305:ERROR:service_manager.cc(889)] Failed to resolve service name: metrics

DevTools listening on ws://127.0.0.1:40731/devtools/browser/c7ec5251-d794-49a1-8fd9-f26c94aa9c39
[1115/065217.652862:ERROR:nss_util.cc(724)] After loading Root Certs, loaded==false: NSS error code: -8018
[1115/065220.088205:WARNING:render_frame_host_impl.cc(2679)] OnDidStopLoading was called twice.
[1115/065220.088888:WARNING:render_frame_host_impl.cc(2679)] OnDidStopLoading was called twice.
[1115/065220.275850:INFO:CONSOLE(26)] "Mixed Content: The page at 'https://www.blogger.com/navbar.g?targetBlogID=4652634595166455172&blogName=Canvasses+of+Poetry+and+Prose&publishMode=PUBLISH_MODE_BLOGSPOT&navbarType=BLUE&layoutType=LAYOUTS&searchRoot=http://giffysk8s.blogspot.com/search&blogLocale=en&v=2&homepageUrl=http://giffysk8s.blogspot.com/&vt=-6074931036815180963&usegapi=1&jsh=m%3B%2F_%2Fscs%2Fapps-static%2F_%2Fjs%2Fk%3Doz.gapi.en_GB.eAe10hJSHzc.O%2Fm%3D__features__%2Fam%3DAQ%2Frt%3Dj%2Fd%3D1%2Frs%3DAGLTcCOSrQAixLCyS0W7RP8OLBQKClcz2w#id=navbar-iframe&_gfid=navbar-iframe&parent=http%3A%2F%2Fgiffysk8s.blogspot.sg&pfname=&rpctoken=18836155' was loaded over a secure connection, but contains a form that targets an insecure endpoint 'http://giffysk8s.blogspot.com/search'. This endpoint should be made available over a secure connection.", source: https://www.blogger.com/navbar.g?targetBlogID=4652634595166455172&blogName=Canvasses+of+Poetry+and+Prose&publishMode=PUBLISH_MODE_BLOGSPOT&navbarType=BLUE&layoutType=LAYOUTS&searchRoot=http://giffysk8s.blogspot.com/search&blogLocale=en&v=2&homepageUrl=http://giffysk8s.blogspot.com/&vt=-6074931036815180963&usegapi=1&jsh=m%3B%2F_%2Fscs%2Fapps-static%2F_%2Fjs%2Fk%3Doz.gapi.en_GB.eAe10hJSHzc.O%2Fm%3D__features__%2Fam%3DAQ%2Frt%3Dj%2Fd%3D1%2Frs%3DAGLTcCOSrQAixLCyS0W7RP8OLBQKClcz2w#id=navbar-iframe&_gfid=navbar-iframe&parent=http%3A%2F%2Fgiffysk8s.blogspot.sg&pfname=&rpctoken=18836155 (26)
[1115/065220.546524:WARNING:render_frame_host_impl.cc(2679)] OnDidStopLoading was called twice.
result: null

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Comments: 19 (3 by maintainers)

Commits related to this issue

Most upvoted comments

@ntzm

  1. you need async to use await
  2. you need to use r._status not r.ok
  3. you need to add delay between loop

please try this code instead

const puppeteer = require('puppeteer');

puppeteer.launch({headless:false, timeout:0})
.then(async browser => {
	const page = await browser.newPage();
	for (var i = 0; i < 15; i++) {
	  await page.goto('https://www.microsoft.com/en-gb/store/d/xbox-one-s-1tb-console-playerunknowns-battlegrounds-bundle/908z9jn5cnh2/gz4w?cid=msft_web_collection', { waitUntil: 'domcontentloaded' })
	  .then(async (response) => {

		console.log(response._status);
	  });
	  await page.waitFor(4000); 	
	}
})
.catch(err => {
	console.log(err);
});	

Update

i’d like to take time to look into test cases

So, after through investigation, i can repro with the minimum test, which is https://github.com/yujiosaka/puppeteer/pull/2/files#diff-b1dc310bde327a785ea47c1b5b93a6e2R1

My mistake is that it was not caused by the redirection. It was a blogpost thing (the listed urls are blog pages generated by blogpost), and i had nothing to do with redirect.

Rather, it was caused by the service called feedjit. This service seems to add an iframe, which request its parent frame’s URL again.

So, it resulted in requesting the main frame’s url multiple times. But anyway, the fix should be same as https://github.com/yujiosaka/puppeteer/pull/2

I will make a PR soon.

I can’t say I tested 100% of the occurrences but when I did print the current url of the page after goto returned null, it was about:blank instead of the url passed togoto.

Maybe a race condition of some kind… maybe fixed by 44d1e834a4525e3bb546988aa312d0d7cb6d1c4a ?