scrapy: SSL website. `twisted.internet.error.ConnectionLost`
Hi everybody! I catch this error on both OS. This HTTPS site can’t be downloaded via scrapy (twisted). I looked on this issue board and I don’t found solution.
Both: Debian 9 / Mac OS
$ scrapy shell "https://wwwnet1.state.nj.us/"
2017-09-07 16:23:02 [scrapy.utils.log] INFO: Scrapy 1.4.0 started (bot: scrapybot)
2017-09-07 16:23:02 [scrapy.utils.log] INFO: Overridden settings: {'LOGSTATS_INTERVAL': 0, 'DUPEFILTER_CLASS': 'scrapy.dupefilters.BaseDupeFilter'}
2017-09-07 16:23:02 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.corestats.CoreStats']
2017-09-07 16:23:02 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.retry.RetryMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2017-09-07 16:23:02 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2017-09-07 16:23:03 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2017-09-07 16:23:03 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023
2017-09-07 16:23:03 [scrapy.core.engine] INFO: Spider opened
2017-09-07 16:23:03 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://wwwnet1.state.nj.us/> (failed 1 times): [<twisted.python.failure.Failure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion: Connection lost.>]
2017-09-07 16:23:03 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://wwwnet1.state.nj.us/> (failed 2 times): [<twisted.python.failure.Failure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion: Connection lost.>]
2017-09-07 16:23:04 [scrapy.downloadermiddlewares.retry] DEBUG: Gave up retrying <GET https://wwwnet1.state.nj.us/> (failed 3 times): [<twisted.python.failure.Failure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion: Connection lost.>]
Traceback (most recent call last):
File "scrapy", line 11, in <module>
sys.exit(execute())
File "/lib/python3.5/site-packages/scrapy/cmdline.py", line 149, in execute
_run_print_help(parser, _run_command, cmd, args, opts)
File "/lib/python3.5/site-packages/scrapy/cmdline.py", line 89, in _run_print_help
func(*a, **kw)
File "/lib/python3.5/site-packages/scrapy/cmdline.py", line 156, in _run_command
cmd.run(args, opts)
File "/lib/python3.5/site-packages/scrapy/commands/shell.py", line 73, in run
shell.start(url=url, redirect=not opts.no_redirect)
File "/lib/python3.5/site-packages/scrapy/shell.py", line 48, in start
self.fetch(url, spider, redirect=redirect)
File "/lib/python3.5/site-packages/scrapy/shell.py", line 115, in fetch
reactor, self._schedule, request, spider)
File "/lib/python3.5/site-packages/twisted/internet/threads.py", line 122, in blockingCallFromThread
result.raiseException()
File "/lib/python3.5/site-packages/twisted/python/failure.py", line 385, in raiseException
raise self.value.with_traceback(self.tb)
twisted.web._newclient.ResponseNeverReceived: [<twisted.python.failure.Failure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion: Connection lost.>]
Mac OSx:
$ scrapy version -v
Scrapy : 1.4.0
lxml : 3.8.0.0
libxml2 : 2.9.4
cssselect : 1.0.1
parsel : 1.2.0
w3lib : 1.18.0
Twisted : 17.9.0rc1
Python : 3.5.1 (default, Jan 22 2016, 08:54:32) - [GCC 4.2.1 Compatible Apple LLVM 7.0.2 (clang-700.1.81)]
pyOpenSSL : 17.2.0 (OpenSSL 1.1.0f 25 May 2017)
Platform : Darwin-16.7.0-x86_64-i386-64bit
Debian 9:
$ scrapy version -v
Scrapy : 1.4.0
lxml : 3.8.0.0
libxml2 : 2.9.3
cssselect : 1.0.1
parsel : 1.2.0
w3lib : 1.18.0
Twisted : 17.9.0rc1
Python : 3.4.2 (default, Oct 8 2014, 10:45:20) - [GCC 4.9.1]
pyOpenSSL : 17.2.0 (OpenSSL 1.1.0f 25 May 2017)
Platform : Linux-3.16.0-4-amd64-x86_64-with-debian-8.7
Mac OSx:
$ openssl s_client -connect wwwnet1.state.nj.us:443 -servername wwwnet1.state.nj.us
CONNECTED(00000003)
140736760988680:error:140790E5:SSL routines:ssl23_write:ssl handshake failure:s23_lib.c:177:
---
no peer certificate available
---
No client certificate CA names sent
---
SSL handshake has read 0 bytes and written 336 bytes
---
New, (NONE), Cipher is (NONE)
Secure Renegotiation IS NOT supported
Compression: NONE
Expansion: NONE
No ALPN negotiated
SSL-Session:
Protocol : TLSv1.2
Cipher : 0000
Session-ID:
Session-ID-ctx:
Master-Key:
Key-Arg : None
PSK identity: None
PSK identity hint: None
SRP username: None
Start Time: 1504790705
Timeout : 300 (sec)
Verify return code: 0 (ok)
---
Debian 9:
CONNECTED(00000003)
---
Certificate chain
0 s:/C=US/ST=New Jersey/L=Trenton/O=New Jersey State Government/OU=E-Gov Services - wwwnet1.state.nj.us/CN=wwwnet1.state.nj.us
i:/C=US/O=Symantec Corporation/OU=Symantec Trust Network/CN=Symantec Class 3 Secure Server SHA256 SSL CA
---
Server certificate
-----BEGIN CERTIFICATE-----
<cut out>
-----END CERTIFICATE-----
<cut out>
---
No client certificate CA names sent
---
SSL handshake has read 1724 bytes and written 635 bytes
---
New, TLSv1/SSLv3, Cipher is DES-CBC3-SHA
Server public key is 2048 bit
Secure Renegotiation IS NOT supported
Compression: NONE
Expansion: NONE
SSL-Session:
Protocol : TLSv1
Cipher : DES-CBC3-SHA
Session-ID: 930F00007F5944DC3C6010F96E95E7FA63656EF5EA35508B055078CEC249DC38
Session-ID-ctx:
Master-Key: 27B02D427F006A57B121CCEFEAA7F33B870DE262848BB6F851242F48F051ABB77BA4ED06706766EE8EE55F6643C9FF55
Key-Arg : None
PSK identity: None
PSK identity hint: None
SRP username: None
Start Time: 1504790821
Timeout : 300 (sec)
Verify return code: 21 (unable to verify the first certificate)
---
Thanks you for your time.
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Comments: 18 (6 by maintainers)
This worked for me:
cryptography<2
(e.g. 1.9 in my case, before OpenSSL 1.1)Using OpenSSL 1.1.0f (with
cryptography==2.0.3
), did not work for me, even when forcing TLS1.0@anapaulagomes you have to use TLSv1.0 and RC4-MD5 cihper. The next command should work in the scraper environment
curl -v --tlsv1.0 --ciphers RC4-MD5 https://www.diariooficial.feiradesantana.ba.gov.br/
You can reach it by compiling the OpenSSL with support SSLv3.I had the same Issue, in my case the solution was to set the
USER_AGENT
in theseetings-py
file:USER_AGENT = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36'