dvc.org: engine: return HTTP 404 when Page404 component is rendered

Extracted from https://github.com/iterative/dvc.org/pull/690#issuecomment-541958989

Example: the engine renders a huge number “404”, however it already responded with HTTP 200:

image

URL: https://dvc.org/doc/command-reference/bad

It would be ideal if the engine could check for the URL path validity first before responding by rendering the general layout (navigation, etc) with HTTP 200, and render it instead with HTTP 404 (and Page404 ).

This can aid in checking links by running script in #690. Once 404 is returned, we will be able to distinguish between existing pages and Not found ones.

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 30 (30 by maintainers)

Commits related to this issue

Most upvoted comments

@iAdramelk it does not sound bad in terms of complexity, unless I’m missing something. What is your expectation?

Checked the doc’s one more time and it is not this simple it looks. At the time then we run getInitialProps the page has already returned 200 header. So if we want to set correct header we still need to do it in server.js. Basically what we need to do is to run getItemByPath before line 109 here:

https://github.com/iterative/dvc.org/blob/5af60ee1cf0ea0530d5d40b8d935730933f3f8d7/server.js#L108-L110

and return with 404 if there is no such page. But to do so we need to rewrite src/Documentation/SidebarMenu/helper.js in commonjs (and probably move it together with sidebar.json to some directory closer to the root).

@algomaster99 Sorry, somehow missed mentions.

Basically /doc/* pages now, for legacy reasons, are using non-standard routing mechanism (and we probably should replace it with the standard one to solve this issue:

  1. All /doc/* pages load pages/doc.js file by default. It is configured in the server.js.
  2. Then in shouldComponentUpdate we check for the URL and try to load corresponding md-file.
  3. If there is no such a file, then we render component for 404 error.

With current approach md-file loading happens after actual page load, so there is no way to solve http-headers problem there. Also this is why SSR is not working right now for docs pages.

Better way to render such pages would be by using getInitialProps this also will automatically render _error.js page if some error occurs inside getInitialProps and also by replacing current docs navigation logic with the Next.js <Link> component. I’m not totally sure which header will express return in such situation and can’t find info in the docs through.

Unfortunately I’m not familiar enough with the engine code yet to answer that question but I feel like it should be simple enough to answer. For this reason we’ve tried tagging @iAdramelk a couple times: I have the impression that he may be the most familiar with the Next.js engine at the moment.

take a look at what URLs exactly does it return 304 (200 if you click it once again, I think - it’s just a matter of caching on the browser

Your’e right @shcheklein. This seems to be standard Node server behavior when the browser request includes If-None-Match header (sent if any URL is reloaded, at least in Chrome).

  1. The point here is that it’s engine requesting specific *.md files, not a regular dvc.org/doc/something.

Right, there’s a routing mechanism. My suggestion is that perhaps this could be evaluated (to determine path validity) before responding 200 with empty layout.

or use Selenium to write tests…

Selenium/ Headless Chrome tests are cool. I wrote a script that loads full dynamic js contents to calculate real web page size some time ago: https://github.com/jorgeorpinel/site-page-size-scraper/. (See full article.) We can still do this separately (what other users do you have in mind Ivan?), but I’m pretty sure Next should be able to return 404 here.

I went through the next documentation. I found two possible techniques…

@algomaster99 I did some searching based on some of the concepts there and found this article. Find the following text, maybe the answer is here: “Next.js by default displays a 404 error page when someone requests a non existing page, but for dynamic routes we need to do it ourselves.”

We can create a URL /doc/404…

I don’t think redirecting after rendering /doc should be the answer because the user will see the website load the empty layout and then reload itself again into the 404.

so, even if change URL to /doc/404 when we render this wget won’t detect this. The only way to detect with wget is either add SSR support (@iAdramelk should have more context on this), or use Selenium to write tests (which might be a good idea anyway, since will be able to test more things, not just broken URLs).

@algomaster99 no, I think dvc.org/doc is mapped (by Next.js, implicitly) to the doc.js that is responsible for everything else after that. Unless I’m missing something.

@shcheklein can you tell me in which file are the components mapped to their URL. I don’t have much experience in next framework. But with simple react, I’m familiar. Usually the URLs are defined like this <Route path='/' component={App} />. I am looking for something corresponding to this here. Basically, how does Next map each url to a component?