puppeteer: Hash-only navigation doesn't work

Since the hash-only navigation doesn’t cause any network requests and doesn’t cause load event, the following gets stuck:

const puppeteer = require('puppeteer');
(async() => {
  let browser = await puppeteer.launch();
  let page = await browser.newPage();
  await page.goto('https://example.com');
  await page.goto('https://example.com#ohh'); // <== stuck here
  browser.close();
})();

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Reactions: 41
  • Comments: 27 (8 by maintainers)

Commits related to this issue

Most upvoted comments

@Means88 This works as a workaround:

await page.goto(url, {waitUntil: 'networkidle'})
const url = await page.evaluate('location.href');

https://github.com/GoogleChromeLabs/puppeteer-examples/blob/master/hash_navigation.js shows how to listen for hashchange events and react accordingly. You might be able to extract ideas from that for a workaround.

Switching between different versions of Puppeteer…

Result:

  • v0.12.0:  page.goto('same.url#different_hash', {waitUntil: 'networkidle'})PASSED
  • v0.13.0: page.goto('same.url#different_hash', {waitUntil: 'networkidle0'})FAILED
  • 1.0.0rc: page.goto('same.url#different_hash', {waitUntil: 'networkidle0'})FAILED

So it looks like there is a regression with lastest versions, or I don’t understand the new options…

And with the history API

// page
history.pushState(null, null, url);

// puppeteer
await page.goBack(); // <=

BTW, page.url() returns the original url, is it a feature or a bug?

This is related to a problem in chrome’s protocol. In short, hash navigation won’t trigger any of the page initialization events.

A puppeteer problem is that page.goto FORCES you to attempt to listen to one of those events (unless there’s some undocumented configuration option), timing out with an error if (when) they never come. The only reliable way around this is to set a low timeout, catch (and disregard) the error that will come, and then manually check if the page has loaded with your own logic.

Is there some way around this?

Is there an upstream bug that we can link to?

The incompatibility with History means that sites using react-router will have issues using page.url(), page.waitForNavigation(), etc

Here are some of my workarounds:

I use page.waitForSelector() instead of page.waitForNavigation() if possible.

I use these two functions for dealing with the URL

const getLocation = async (page) => page.evaluate(() => location)
const getLocationProp = async (page, prop) => (await getLocation(page))[prop]

and these for history:

const getHistory = async (page) => page._client.send('Page.getNavigationHistory')
const getHistoryEntry = async (page, index) => (await getHistory(page)).entries[index]
const getCurrentHistoryEntry = async (page) => {
  const { entries, currentIndex } = await getHistory(page)
  return entries[currentIndex]
}

based on https://github.com/GoogleChrome/puppeteer/blob/7d18275fb981e01cec4a4fbac61a9c66e46947bc/lib/Page.js#L532-L533

@onamission thanks! It is works for me.

await page.goto(tag.url, {waitUntil: 'networkidle'})

I am not sure if this is completely relevant to this thread, but I just created a script that allows us to navigate to hashes on a page to take various screenshots. The reason I question the relevance is because we have some JS working in the background that assists our navigation, so I don’t know if this script would work without the page JS or not. Anyway, here is what works for me (sorry, it is node6):

getScreenshotOfSlides(page, url, slides, buffer, options) {
        var self = this;
        return new Promise((resolve, reject) => {
            if (options && options.path) {
                screenshotOptions.path = options.path;
            }
            if (!slides || !slides.length) {
                return resolve(buffer);
            }
            var slide = slides.shift();
            var pageUrl = url + "/#/" + slide;   // our url's use slashes around the hash
            return page.goto(pageUrl, { waitUntil: 'networkidle' })
                .then(res => {
                    return page.screenshot(screenshotOptions);
                })
                .then(res => {
                    buffer.push(res);
                    return self.getScreenshotOfSlides(page, url, slides, buffer, options);
                })
                .then(res => {
                    return resolve(buffer);
                })
                .catch(err => {
                    return reject(err);
                });
        });

Using a the browser inspector and Charles it appears that the only network traffic this creates is to make the initial call to the server. After that has rendered the network goes quiet.

This is a solution for the issue I was trying to solve that lead me to #491 – which brought me here.

I hope this helps someone.