almanac.httparchive.org: Investigate 404 errors
In the production server logs I’m seeing lots of ambiguous error messages like this:
werkzeug.exceptions.NotFound: 404 Not Found: The requested URL was not found on the server. If you entered the URL manually please check your spelling and try again.
at match (/env/lib/python3.7/site-packages/werkzeug/routing.py:1799)
at match_request (/env/lib/python3.7/site-packages/flask/ctx.py:336)
at raise_routing_exception (/env/lib/python3.7/site-packages/flask/app.py:1774)
at dispatch_request (/env/lib/python3.7/site-packages/flask/app.py:1791)
at full_dispatch_request (/env/lib/python3.7/site-packages/flask/app.py:1813)
At times the server is spiking at 200 404s per minute. (This is suspiciously high)
Sometimes this happens when a site doesn’t have a favicon or something innocuous, but I can’t imagine why we’d be having this many 404s unless there’s a broken link somewhere.
Two things:
- Improve error logging so we know what the broken link is and where it’s coming from (cc @mikegeyser)
- Rerun the SEO-style audit of the website so that we can more easily/proactively find broken links (#286 cc @AymenLoukil @catalinred @rachellcostello)
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 15 (15 by maintainers)
OK I got it.
We don’t have a working 404 page - except for the routes we have defined (i.e.
/static/XXX
or/lang/year/XXX
).This repeats the error: http://127.0.0.1:8080/en/ for example, as does https://127.0.0.1:8080/anythingrandom - because we have no routes matching those patterns.
It shows an error page instead of the 404 page and returns a 500 to the browser, though it did start life as a 404:
Adding a default route like this fixes it:
And I know this fixes it as it returns our correct 404 page and gives that exact error message on it (
barry was here
) so I know it’s making it to this route.Other posts seem to suggest that is how this should work, and I’ve tested and the other routes still work (home page, chapters, methodology…etc.) as well as static pages, sitemap.xml …etc.
Will submit a PR, though suppose I should change the 404 error message 😀
However I’m also going to add a case to handle that /en/ case and redirect to default year:
I’m still seeing vague 404 error messages in Stackdriver:
However, the actual App Engine server logs are no longer showing any meaningful errors on things like broken images or bad requests, so I’m comfortable closing this issue.
Good find!
I run a crawl and here are the links generating errors :
https://almanac.httparchive.org/en/2019/](https://developer.mozilla.org/en-US/docs/Web/HTTP/CORS) from https://almanac.httparchive.org/en/2019/resource-hints
https://almanac.httparchive.org/static/images/2019/05_Third_Parties/fig7.png from https://almanac.httparchive.org/en/2019/third-parties
https://almanac.httparchive.org/static/images/2019/08_Security/fig1.png from https://almanac.httparchive.org/en/2019/security
https://www.ssllabs.com/ssl-pulse/) from https://almanac.httparchive.org/en/2019/security
https://almanac.httparchive.org/static/images/2019/08_Security/fig8.png from https://almanac.httparchive.org/en/2019/security
https://almanac.httparchive.org/static/images/2019/08_Security/fig3.png from https://almanac.httparchive.org/en/2019/security
https://almanac.httparchive.org/static/images/2019/08_Security/fig2.png from https://almanac.httparchive.org/en/2019/security
https://fonts.gstatic.com/ from https://almanac.httparchive.org/en/2019/fonts
https://rainy-periwinkle.glitch.me/permalink/bc8f154a95dfe06a6d0fdb099b6c8df61727b2289141a0ef16dc17b2b57d3068.html from https://almanac.httparchive.org/en/2019/markup https://rainy-periwinkle.glitch.me/permalink/3214f840b6ae3ef1074291f60fa1be4b9d9df401fe0190bfaff4bb078c8614a5.html from https://almanac.httparchive.org/en/2019/markup
Modify these links to HTTPS :
http://speedcurve.com/ from https://almanac.httparchive.org/en/2019/contributors http://paulcalvano.com/ from https://almanac.httparchive.org/en/2019/contributors http://www.filamentgroup.com/ from https://almanac.httparchive.org/en/2019/fonts