docusaurus: Docusaurus v2 doesn't allow for "mypagename.html" links

We migrated and now have a series of issues being raised because none of our 7 year old links ending in .html resolve correctly.

This is the main issue tracking this: https://github.com/facebook/watchman/issues/798

A couple of kind souls have submitted PRs to change links elsewhere: https://github.com/facebook/watchman/pull/806 https://github.com/facebook/watchman/pull/801

but this is really a docusaurus issue. How can we get this fixed?

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 30 (11 by maintainers)

Commits related to this issue

Most upvoted comments

Hi all,

Here are a few comments, proposals and questions I have

Valid urls of Watchman

I understand that such url should work:

Does it mean that BOTH urls should work?

I don’t know how Watchman site worked before, is it still online somewhere to check?

Hosting on Github pages

Watchman is hosted on Github pages. As far as I know, it’s not possible to do any server-side redirect on this hosting solution.

Also, for Github pages to serve a non-404 answer, the file actually has to exist on the FS with the html extension.

On other platforms like netlify, it would have been posssible to drop a simple _redirects file and handle this.

It might be a good idea to start using a custom domain, which would allow more flexibility to change the underlying hosting solution without too much pain.

Using .html extension in document id

If Watchman just need the /nodejs.html, and not the /nodejs page, it’s possible to use .html as suffix in document ids

image

Using a file like filename.html.md also works

image

image

duplicating the pages

Creating 2 html pages for the same document could be a portable solution

---
id: nodejs
extraPaths: 
  - nodejs.htm
  - nodejs.html
title: NodeJS page
---

The duplicate pages would have a canonical url to the main page so that SEO can know which page is the main one.

Is it worth redirecting in this case? or can the browser just stay on the non-canonical page if it serves the correct content?

404 + redirecting

This looks like the solution @lex111 implemented here: https://github.com/facebook/docusaurus/pull/2704 Which was not merged due to SEO reasons related to serving 404.

I found this note here: https://github.com/rafrex/spa-github-pages

A quick SEO note - while it’s never good to have a 404 response, it appears based on Search Engine Land’s testing that Google’s crawler will treat the JavaScript window.location redirect in the 404.html file the same as a 301 redirect for its indexing. From my testing I can confirm that Google will index all pages without issue, the only caveat is that the redirect query is what Google indexes as the url. For example, the url example.tld/about will get indexed as example.tld/?p=/about. When the user clicks on the search result, the url will change back to example.tld/about once the site loads.

I’d prefer not to do that as well but that remains an option.


What do you think?

Hi @wez , sorry for the delay.

Just wanted to let you know that the fixes you need are already merged on master and will be released soon in 2.0.0-alpha.57


What you will need to do on Watchman:

add a slug to each doc

With the extension you want, which will be used for the main/canonical/SEO url

---
id: nodejs
slug: nodejs.html
title: NodeJS
---

Use the client redirects plugin

Plugin doc

Your configuration should look like:

module.exports = {
  plugins: [
    [
      '@docusaurus/plugin-client-redirects',
      {
        toExtensions: ['html'],
      },
    ],
  ],
};

And if there exist a /docs/nodejs.html page, then going to /docs/nodejs will redirect to /docs/nodejs.html

I see. So it should work to build the site first without the legacy docs folder, then manually add the legacy docs afterwards in a subfolder of ‘build’…correct?

We’ll also make a docusaurus serve command and recommend a way to test a production build locally => https://github.com/facebook/docusaurus/issues/3062

Sorry that your experience wasn’t as great as it should have been, and for the time lost giving it a try 😞

I didn’t document that the plugin worked only for the production build on the initial release alpha 58, sorry about that. It is currently documented in the master branch here, but didn’t backport it to the alpha 58 doc.

Next time you give it a try, please reach on Discord, I’ll be there to help.

I’d love to have a way to test redirects locally when we revisit docusaurus; it would help build confidence before we push to production!

It is possible to test locally, but still involves the production build.

You can run the docusaurus build cmd (viayarn build normally), and then serve the build folder locally with any http server (I’d recommend serve, very simple one, no need for Apache or whatever)

# creates the /build folder (production build including the redirects)
yarn build

# host it locally:
yarn add serve
yarn serve build

// open http://localhost:5000


It is not so simple to make this work with yarn start easily, because the redirect files are lightweight, and not part of the Docusaurus client side routing system (SPA based on React / ReactRouter). We should be able to redirect to the correct page asap, without needing to wait for React and Docusaurus JS infra to download.

It may be possible to generate those lightweight redirect files before spawning the webpack dev server, but would probably decrease the startup speed of the project in dev mode.

@wez, if we succeed to make nodejs.html the main page, is it ok to have a simple client-side redirect from “/nodejs” to “/nodejs.html”?

@slorber Yeah, that’s fine with me! Thanks for looking at this!

That means that we must write to disk these 2 files if we don’t want a 404 status code from github pages:

@slorber This plugin by @lex111 might be helpful - https://github.com/single-spa/single-spa.js.org/blob/master/website/src/plugins/docusaurus-plugin-redirects/src/index.js

Hi @wez @JoelMarcey

I understand that we are looking for a portable solution, and not really willing to leverage hosting platform configuration.

That means that we must write to disk these 2 files if we don’t want a 404 status code from github pages:

  • docs/nodejs.html
  • docs/nodejs/index.html

I’m looking at how to make this work and I see 2 solutions:

  • duplicate the page, and use canonical urls to tell google nodejs.html is the main one
  • have nodejs/index.html do a client-side redirect (empty/lightweight page, no content)

Note, it seems possible to trigger a client-side redirection with a html tag as well. It seems understood by google as a redirect (despite being not recommended).

<meta http-equiv="refresh" content="0; url=https://facebook.github.io/watchman/docs/nodejs.html">

It should be possible to provide more advanced configuration, like:

---
id: nodejs
path: nodejs.html
redirectPaths: 
  - nodejs
duplicatePaths: 
  - nodejs
title: NodeJS page
---

@wez, if we succeed to make nodejs.html the main page, is it ok to have a simple client-side redirect from “/nodejs” to “/nodejs.html”?


I did try simply renaming files before I posted this issue and it didn’t change the behavior. If there have been changes since then I’m happy to work through and try your suggestions.

According to this comment: https://github.com/facebook/watchman/issues/798#issuecomment-619300064

It’s not totally clear to me what you have tried, but it looks like you tried this: nodejs/index.html.md What I’m suggesting is nodejs.html.md as filename. If you specify document id as frontmatter you need to use id: nodejs.html. In Watchman docs I can see the nodejs doc has a frontmatter id (the filename actually has no effect on the pathname).

I’m able to get this working locally (should also work for GH pages). I opened an example PR for watchman website here: https://github.com/facebook/watchman/pull/812


If we validate that the workaround works:

  • is id: nodejs.html the api we want to recommend for this usecase? (it’s a bit weird to me, we should probably be able to customize the path of each doc completely?)
  • We should still implement something to redirect from /nodejs to /nodejs.html, due to existing unprefixed links actually deployed

We should decide if we want a document pathname customization feature, as if we start migrating Watchman doc to the workaround id: nodejs.html, to ship 1 week later a clean way to solve this usecase, we’d then have to migrate the Watchman site from workaround to clean solution.