dvc.org: engine: return HTTP 404 when Page404 component is rendered
Extracted from https://github.com/iterative/dvc.org/pull/690#issuecomment-541958989
Example: the engine renders a huge number “404”, however it already responded with HTTP 200:
It would be ideal if the engine could check for the URL path validity first before responding by rendering the general layout (navigation, etc) with HTTP 200, and render it instead with HTTP 404 (and Page404
).
This can aid in checking links by running script in #690. Once
404
is returned, we will be able to distinguish between existing pages andNot found
ones.
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 30 (30 by maintainers)
Commits related to this issue
- Merge pull request #872 from iterative/gf-fix_#697_status-404 Fix #697: Make unexisting docs pages return 404 status — committed to iterative/dvc.org by jorgeorpinel 5 years ago
@iAdramelk it does not sound bad in terms of complexity, unless I’m missing something. What is your expectation?
Checked the doc’s one more time and it is not this simple it looks. At the time then we run
getInitialProps
the page has already returned 200 header. So if we want to set correct header we still need to do it inserver.js
. Basically what we need to do is to rungetItemByPath
before line 109 here:https://github.com/iterative/dvc.org/blob/5af60ee1cf0ea0530d5d40b8d935730933f3f8d7/server.js#L108-L110
and return with 404 if there is no such page. But to do so we need to rewrite
src/Documentation/SidebarMenu/helper.js
in commonjs (and probably move it together withsidebar.json
to some directory closer to the root).@algomaster99 Sorry, somehow missed mentions.
Basically
/doc/*
pages now, for legacy reasons, are using non-standard routing mechanism (and we probably should replace it with the standard one to solve this issue:/doc/*
pages loadpages/doc.js
file by default. It is configured in theserver.js
.shouldComponentUpdate
we check for the URL and try to load corresponding md-file.With current approach md-file loading happens after actual page load, so there is no way to solve http-headers problem there. Also this is why SSR is not working right now for docs pages.
Better way to render such pages would be by using
getInitialProps
this also will automatically render_error.js
page if some error occurs insidegetInitialProps
and also by replacing current docs navigation logic with the Next.js <Link> component. I’m not totally sure which header will express return in such situation and can’t find info in the docs through.Unfortunately I’m not familiar enough with the engine code yet to answer that question but I feel like it should be simple enough to answer. For this reason we’ve tried tagging @iAdramelk a couple times: I have the impression that he may be the most familiar with the Next.js engine at the moment.
Your’e right @shcheklein. This seems to be standard Node server behavior when the browser request includes
If-None-Match
header (sent if any URL is reloaded, at least in Chrome).Right, there’s a routing mechanism. My suggestion is that perhaps this could be evaluated (to determine path validity) before responding 200 with empty layout.
Selenium/ Headless Chrome tests are cool. I wrote a script that loads full dynamic js contents to calculate real web page size some time ago: https://github.com/jorgeorpinel/site-page-size-scraper/. (See full article.) We can still do this separately (what other users do you have in mind Ivan?), but I’m pretty sure Next should be able to return 404 here.
@algomaster99 I did some searching based on some of the concepts there and found this article. Find the following text, maybe the answer is here: “Next.js by default displays a 404 error page when someone requests a non existing page, but for dynamic routes we need to do it ourselves.”
I don’t think redirecting after rendering /doc should be the answer because the user will see the website load the empty layout and then reload itself again into the 404.
so, even if change URL to
/doc/404
when we render thiswget
won’t detect this. The only way to detect with wget is either add SSR support (@iAdramelk should have more context on this), or use Selenium to write tests (which might be a good idea anyway, since will be able to test more things, not just broken URLs).@algomaster99 no, I think
dvc.org/doc
is mapped (by Next.js, implicitly) to thedoc.js
that is responsible for everything else after that. Unless I’m missing something.@shcheklein can you tell me in which file are the components mapped to their URL. I don’t have much experience in next framework. But with simple react, I’m familiar. Usually the URLs are defined like this
<Route path='/' component={App} />
. I am looking for something corresponding to this here. Basically, how does Next map each url to a component?