astropy: find_api_page HTTPError 403

For me, find_api_page doesn’t work:

(base) hfm-1804a:scipy deil$ python
Python 3.7.3 (default, Mar 27 2019, 16:54:48) 
[Clang 4.0.1 (tags/RELEASE_401/final)] :: Anaconda, Inc. on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from astropy import find_api_page
>>> from astropy.units import Quantity
>>> find_api_page(Quantity)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/deil/software/anaconda3/lib/python3.7/site-packages/astropy/utils/misc.py", line 228, in find_api_page
    uf = urllib.request.urlopen(baseurl + 'objects.inv')
  File "/Users/deil/software/anaconda3/lib/python3.7/urllib/request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "/Users/deil/software/anaconda3/lib/python3.7/urllib/request.py", line 531, in open
    response = meth(req, response)
  File "/Users/deil/software/anaconda3/lib/python3.7/urllib/request.py", line 641, in http_response
    'http', request, response, code, msg, hdrs)
  File "/Users/deil/software/anaconda3/lib/python3.7/urllib/request.py", line 569, in error
    return self._call_chain(*args)
  File "/Users/deil/software/anaconda3/lib/python3.7/urllib/request.py", line 503, in _call_chain
    result = func(*args)
  File "/Users/deil/software/anaconda3/lib/python3.7/urllib/request.py", line 649, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden

Following https://stackoverflow.com/questions/3336549 I tried this, which suggests that the Cloudflare CDN blocks these requests:

>>> try:find_api_page(Quantity)
... except Exception as e: print(e.fp.read())
... 
b'<!DOCTYPE html>\n<!--[if lt IE 7]> <html class="no-js ie6 oldie" lang="en-US"> <![endif]-->\n<!--[if IE 7]>    <html class="no-js ie7 oldie" lang="en-US"> <![endif]-->\n<!--[if IE 8]>    <html class="no-js ie8 oldie" lang="en-US"> <![endif]-->\n<!--[if gt IE 8]><!--> <html class="no-js" lang="en-US"> <!--<![endif]-->\n<head>\n<title>Access denied | docs.astropy.org used Cloudflare to restrict access</title>\n<meta charset="UTF-8" />\n<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />\n<meta http-equiv="X-UA-Compatible" content="IE=Edge,chrome=1" />\n<meta name="robots" content="noindex, nofollow" />\n<meta name="viewport" content="width=device-width,initial-scale=1,maximum-scale=1" />\n<link rel="stylesheet" id="cf_styles-css" href="/cdn-cgi/styles/cf.errors.css" type="text/css" media="screen,projection" />\n<!--[if lt IE 9]><link rel="stylesheet" id=\'cf_styles-ie-css\' href="/cdn-cgi/styles/cf.errors.ie.css" type="text/css" media="screen,projection" /><![endif]-->\n<style type="text/css">body{margin:0;padding:0}</style>\n\n\n<!--[if gte IE 10]><!--><script type="text/javascript" src="/cdn-cgi/scripts/zepto.min.js"></script><!--<![endif]-->\n<!--[if gte IE 10]><!--><script type="text/javascript" src="/cdn-cgi/scripts/cf.common.js"></script><!--<![endif]-->\n\n\n\n</head>\n<body>\n  <div id="cf-wrapper">\n    <div class="cf-alert cf-alert-error cf-cookie-error" id="cookie-alert" data-translate="enable_cookies">Please enable cookies.</div>\n    <div id="cf-error-details" class="cf-error-details-wrapper">\n      <div class="cf-wrapper cf-header cf-error-overview">\n        <h1>\n          <span class="cf-error-type" data-translate="error">Error</span>\n          <span class="cf-error-code">1010</span>\n          <small class="heading-ray-id">Ray ID: 4f5c36966fc6cc4a &bull; 2019-07-13 15:15:36 UTC</small>\n        </h1>\n        <h2 class="cf-subheadline">Access denied</h2>\n      </div><!-- /.header -->\n\n      <section></section><!-- spacer -->\n\n      <div class="cf-section cf-wrapper">\n        <div class="cf-columns two">\n          <div class="cf-column">\n            <h2 data-translate="what_happened">What happened?</h2>\n            <p>The owner of this website (docs.astropy.org) has banned your access based on your browser\'s signature (4f5c36966fc6cc4a-ua48).</p>\n          </div>\n\n          \n        </div>\n      </div><!-- /.section -->\n\n      <div class="cf-error-footer cf-wrapper">\n  <p>\n    <span class="cf-footer-item">Cloudflare Ray ID: <strong>4f5c36966fc6cc4a</strong></span>\n    <span class="cf-footer-separator">&bull;</span>\n    <span class="cf-footer-item"><span>Your IP</span>: 147.86.175.50</span>\n    <span class="cf-footer-separator">&bull;</span>\n    <span class="cf-footer-item"><span>Performance &amp; security by</span> <a href="https://www.cloudflare.com/5xx-error-landing?utm_source=error_footer" id="brand_link" target="_blank">Cloudflare</a></span>\n    \n  </p>\n</div><!-- /.error-footer -->\n\n\n    </div><!-- /#cf-error-details -->\n  </div><!-- /#cf-wrapper -->\n\n  <script type="text/javascript">\n  window._cf_translation = {};\n  \n  \n</script>\n\n</body>\n</html>\n'

The problem is not with the URL. http://docs.astropy.org/en/v3.1.2/objects.inv exists, I can download it with my browser, or like this:

>>> import requests
>>> requests.get("http://docs.astropy.org/en/v3.1.2/objects.inv")
<Response [200]>

@eteq or anyone - Can you reproduce? What should we do?

I guess if we want to keep it, we should either use requests as dependency for this, or find the right incantation to do this with urllib?

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 20 (20 by maintainers)

Most upvoted comments

We’ll presumably know for 3.2.2, no?

Congratulations on getting 403?!

I get the same error. This is probably because the file is hosted on cloudflare which seems to require to accept cookies (which requests does by default):

In [1]: import urllib.request                                                                  
In [3]: urllib.request.urlopen('http://docs.astropy.org/en/v3.1.2/objects.inv')                
....
HTTPError: HTTP Error 403: Forbidden

In [4]: import requests                                                                        

In [5]: requests.get('http://docs.astropy.org/en/v3.1.2/objects.inv')                          
Out[5]: <Response [200]>

In [6]: res = requests.get('http://docs.astropy.org/en/v3.1.2/objects.inv')                    

In [7]: res.cookies                                                                            
Out[7]: <RequestsCookieJar[Cookie(version=0, name='__cfduid', value='d835a268fe1cfcacf467a8fb09e3c999c1563571740', port=None, port_specified=False, domain='.docs.astropy.org', domain_specified=True, domain_initial_dot=True, path='/', path_specified=True, secure=False, expires=1595107740, discard=False, comment=None, comment_url=None, rest={'HttpOnly': None}, rfc2109=False)]>

For once, things work on Windows but not on other OS…

My confirmation was on my Debian system (and more-or-less current master)