juriscraper: Blocked scraper: opinions.united_states.state.ky

Scraper works slowly on my machine, but have been timing out in production for months now. I’m suspicious that the court may be blocking our production IP.

Need to call the court.

********!! CRAWLER DOWN !!***********
*****scrape_court method failed!*****
********!! ACTION NEEDED !!**********
Traceback (most recent call last):
File "/var/www/courtlistener/cl/scrapers/management/commands/cl_scrape_opinions.py", line 324, in handle
self.parse_and_scrape_site(mod, options['full_crawl'])
File "/var/www/courtlistener/cl/scrapers/management/commands/cl_scrape_opinions.py", line 289, in parse_and_scrape_site
site = mod.Site().parse()
File "/var/www/.virtualenvs/courtlistener/local/lib/python2.7/site-packages/juriscraper/AbstractSite.py", line 112, in parse
self.html = self._download()
File "/var/www/.virtualenvs/courtlistener/local/lib/python2.7/site-packages/juriscraper/opinions/united_states/state/ky.py", line 105, in _download
html = super(Site, self)._download(request_dict)
File "/var/www/.virtualenvs/courtlistener/local/lib/python2.7/site-packages/juriscraper/AbstractSite.py", line 300, in _download
self._request_url_post(self.url)
File "/var/www/.virtualenvs/courtlistener/local/lib/python2.7/site-packages/juriscraper/AbstractSite.py", line 345, in _request_url_post
**self.request['parameters']
File "/var/www/.virtualenvs/courtlistener/local/lib/python2.7/site-packages/requests/sessions.py", line 572, in post
return self.request('POST', url, data=data, json=json, **kwargs)
File "/var/www/.virtualenvs/courtlistener/local/lib/python2.7/site-packages/requests/sessions.py", line 524, in request
resp = self.send(prep, **send_kwargs)
File "/var/www/.virtualenvs/courtlistener/local/lib/python2.7/site-packages/requests/sessions.py", line 637, in send
r = adapter.send(request, **kwargs)
File "/var/www/.virtualenvs/courtlistener/local/lib/python2.7/site-packages/requests/adapters.py", line 504, in send
raise ConnectTimeout(e, request=request)
ConnectTimeout: HTTPConnectionPool(host='162.114.92.72', port=80): Max retries exceeded with url: /dtSearch/dtisapi6.dll (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7f288e317ad0>, 'Connection to 162.114.92.72 timed out. (connect timeout=60)'))

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 16 (16 by maintainers)

Commits related to this issue

Most upvoted comments

Heard back from Jamie Neal. She’s going to forward an email on my behalf. We’ll see.

I tried with the same UA on my laptop and on prod. I’ll definitely add her name to a file if I can get through to her.