readthedocs.org: Requests by Google(bot) will be answered with 403 Forbidden by Cloudflare
Details
We registered a recent drop in traffic and checked the Search Console for pointers, apparently Google is no longer allowed to fetch the various sitemaps and content of our RTD repositories. The requests result in 403 errors and it says “Couldn’t fetch” in the overview and this in the “details” (which is not very conclusive, especially if opening the sitemap works on all our machines):
Pages:
Sitemaps:
Only when you add -A "googlebot"
or its derivatives in the curl request, it also throws a 403 error (might be unrelated due to the way googlebots and the corresponding ip addresses work, but I thought I would mention it).
Like so:
$ curl -A "googlebot" --head https://crate.io/docs/crate/reference/en/latest/sitemap.xml
HTTP/2 403
Can anyone confirm a similar issue on their sitemaps?
- Read the Docs project URL: https://crate.readthedocs.io/en/latest/sitemap.xml / https://crate.io/docs/crate/reference/en/latest/sitemap.xml (reverse proxied)
- Build URL: https://readthedocs.org/projects/crate/builds/
Expected Result
Google fetching our sitemaps as they used to.
Actual Result
Fetching blocked by 403 errors.
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 16 (9 by maintainers)
Commits related to this issue
- Add codespell to tox — committed to readthedocs/readthedocs.org by cclauss 3 years ago
All right, thank you!
Apparently, the
curl
request only was succeeding on a page which would yield a 302 redirect. On a regular page, we still getYeah, it definitively feels right, now that we know about the origin of the “problem” - it’s actually a feature and led us to wrong conclusions while trying to reproduce the issue. So, let us have a review on our Nginx settings at crate.io together with @WalBeh, we will come back here and report afterwards. Thanks again for taking the time!
Right. Thank you so much!