2015-09-09 16:51:16 [scrapy] INFO: Spider opened
Traceback (most recent call last):
File "/usr/local/bin/scrapy", line 11, in <module>
sys.exit(execute())
File "/home/victor/.local/lib/python2.7/site-packages/scrapy/cmdline.py", line 143, in execute
_run_print_help(parser, _run_command, cmd, args, opts)
File "/home/victor/.local/lib/python2.7/site-packages/scrapy/cmdline.py", line 89, in _run_print_help
func(*a, **kw)
File "/home/victor/.local/lib/python2.7/site-packages/scrapy/cmdline.py", line 150, in _run_command
cmd.run(args, opts)
File "/home/victor/.local/lib/python2.7/site-packages/scrapy/commands/shell.py", line 63, in run
shell.start(url=url)
File "/home/victor/.local/lib/python2.7/site-packages/scrapy/shell.py", line 44, in start
self.fetch(url, spider)
File "/home/victor/.local/lib/python2.7/site-packages/scrapy/shell.py", line 87, in fetch
reactor, self._schedule, request, spider)
File "/usr/local/lib/python2.7/dist-packages/twisted/internet/threads.py", line 122, in blockingCallFromThread
result.raiseException()
File "<string>", line 2, in raiseException
twisted.web._newclient.ResponseNeverReceived: [<twisted.python.failure.Failure OpenSSL.SSL.Error: [('SSL routines', 'SSL3_READ_BYTES', 'ssl handshake failure')]>]
I have also checked that TLSv1 can negociate this cert with curl and everything is fine
➜ mencanta-spiders git:(master) ✗ scrapy version -v
2015-09-09 16:55:31 [scrapy] INFO: Scrapy 1.0.3 started (bot: mencanta)
2015-09-09 16:55:31 [scrapy] INFO: Optional features available: ssl, http11, boto
2015-09-09 16:55:31 [scrapy] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'mencanta.spiders', 'LOG_LEVEL': 'INFO', 'HTTPCACHE_EXPIRATION_SECS': 604800, 'HTTPCACHE_IGNORE_HTTP_CODES': [301, 302, 307, 403, 404, 401, 400, 402, 407, 500], 'SPIDER_MODULES': ['mencanta.spiders'], 'HTTPCACHE_ENABLED': True, 'BOT_NAME': 'mencanta', 'USER_AGENT': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.85 Safari/537.36'}
Scrapy : 1.0.3
lxml : 3.4.4.0
libxml2 : 2.9.2
Twisted : 15.4.0
Python : 2.7.9 (default, Apr 2 2015, 15:33:21) - [GCC 4.9.2]
Platform: Linux-3.19.0-26-generic-x86_64-with-Ubuntu-15.04-vivid
Got it on current master branch:
$ scrapy shell https://www.notjustalabel.com/products/womens
2016-01-26 19:10:04 [scrapy] INFO: Scrapy 1.1.0dev1 started (bot: scrapybot)
2016-01-26 19:10:04 [scrapy] INFO: Overridden settings: {'LOGSTATS_INTERVAL': 0, 'DUPEFILTER_CLASS': 'scrapy.dupefilters.BaseDupeFilter'}
2016-01-26 19:10:04 [scrapy] INFO: Enabled extensions:
['scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.corestats.CoreStats']
2016-01-26 19:10:04 [scrapy] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.retry.RetryMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.chunked.ChunkedTransferMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2016-01-26 19:10:04 [scrapy] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2016-01-26 19:10:04 [scrapy] INFO: Enabled item pipelines:
2016-01-26 19:10:04 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:6023
2016-01-26 19:10:04 [scrapy] INFO: Spider opened
2016-01-26 19:10:04 [scrapy] DEBUG: Retrying <GET https://www.notjustalabel.com/products/womens> (failed 1 times): [<twisted.python.failure.Failure OpenSSL.SSL.Error: [('SSL routines', 'ssl3_read_bytes', 'ssl handshake failure')]>]
2016-01-26 19:10:04 [scrapy] DEBUG: Retrying <GET https://www.notjustalabel.com/products/womens> (failed 2 times): [<twisted.python.failure.Failure OpenSSL.SSL.Error: [('SSL routines', 'ssl3_read_bytes', 'ssl handshake failure')]>]
2016-01-26 19:10:05 [scrapy] DEBUG: Gave up retrying <GET https://www.notjustalabel.com/products/womens> (failed 3 times): [<twisted.python.failure.Failure OpenSSL.SSL.Error: [('SSL routines', 'ssl3_read_bytes', 'ssl handshake failure')]>]
Traceback (most recent call last):
File "/home/paul/.virtualenvs/scrapydev/bin/scrapy", line 9, in <module>
load_entry_point('Scrapy', 'console_scripts', 'scrapy')()
File "/home/paul/src/scrapy/scrapy/cmdline.py", line 142, in execute
_run_print_help(parser, _run_command, cmd, args, opts)
File "/home/paul/src/scrapy/scrapy/cmdline.py", line 88, in _run_print_help
func(*a, **kw)
File "/home/paul/src/scrapy/scrapy/cmdline.py", line 149, in _run_command
cmd.run(args, opts)
File "/home/paul/src/scrapy/scrapy/commands/shell.py", line 70, in run
shell.start(url=url)
File "/home/paul/src/scrapy/scrapy/shell.py", line 47, in start
self.fetch(url, spider)
File "/home/paul/src/scrapy/scrapy/shell.py", line 112, in fetch
reactor, self._schedule, request, spider)
File "/home/paul/.virtualenvs/scrapydev/local/lib/python2.7/site-packages/twisted/internet/threads.py", line 122, in blockingCallFromThread
result.raiseException()
File "<string>", line 2, in raiseException
twisted.web._newclient.ResponseNeverReceived: [<twisted.python.failure.Failure OpenSSL.SSL.Error: [('SSL routines', 'ssl3_read_bytes', 'ssl handshake failure')]>]
@VMRuiz , I don't know if you still have the issue, but with the trick from #1429 (comment)
(which I also tried successfully on #1764 (comment))
Would you mind testing it on your end?
Here's what I did:
Console:
$ scrapy crawl notjustlabel
2016-02-09 18:03:09 [scrapy] INFO: Scrapy 1.0.5 started (bot: sslissues)
2016-02-09 18:03:09 [scrapy] INFO: Optional features available: ssl, http11
2016-02-09 18:03:09 [scrapy] INFO: Overridden settings:
{'NEWSPIDER_MODULE': 'sslissues.spiders',
'SPIDER_MODULES': ['sslissues.spiders'],
'DOWNLOADER_CLIENTCONTEXTFACTORY': 'sslissues.contextfactory.TLSFlexibleContextFactory',
'BOT_NAME': 'sslissues'}
2016-02-09 18:03:10 [scrapy] DEBUG: Crawled (200) <GET https://www.notjustalabel.com/products/womens> (referer: None)
2016-02-09 18:03:10 [scrapy] INFO: Closing spider (finished)
2016-02-09 18:03:10 [scrapy] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 234,
'downloader/request_count': 1,
'downloader/request_method_count/GET': 1,
'downloader/response_bytes': 13911,
'downloader/response_count': 1,
'downloader/response_status_count/200': 1,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2016, 2, 9, 17, 3, 10, 127490),
'log_count/DEBUG': 2,
'log_count/INFO': 7,
'response_received_count': 1,
'scheduler/dequeued': 1,
'scheduler/dequeued/memory': 1,
'scheduler/enqueued': 1,
'scheduler/enqueued/memory': 1,
'start_time': datetime.datetime(2016, 2, 9, 17, 3, 9, 735698)}
2016-02-09 18:03:10 [scrapy] INFO: Spider closed (finished)
Project settings:
$ cat sslissues/settings.py
# -*- coding: utf-8 -*-
# Scrapy settings for sslissues project
BOT_NAME = 'sslissues'
SPIDER_MODULES = ['sslissues.spiders']
NEWSPIDER_MODULE = 'sslissues.spiders'
DOWNLOADER_CLIENTCONTEXTFACTORY = 'sslissues.contextfactory.TLSFlexibleContextFactory'
Custom context factory:
$ cat sslissues/contextfactory.py
from OpenSSL import SSL
from scrapy.core.downloader.contextfactory import ScrapyClientContextFactory
class TLSFlexibleContextFactory(ScrapyClientContextFactory):
"""A more protocol flexible TLS/SSL context factory.
A TLS/SSL connection established with these methods may understand
the SSLv3, TLSv1, TLSv1.1 and TLSv1.2 protocols.
See https://www.openssl.org/docs/manmaster/ssl/SSL_CTX_new.html
def __init__(self):
self.method = SSL.SSLv23_METHOD
Spider code:
$ cat sslissues/spiders/notjustlabel.py
# -*- coding: utf-8 -*-
import scrapy
class NotjustlabelSpider(scrapy.Spider):
name = "notjustlabel"
allowed_domains = ["notjustalabel.com"]
start_urls = (
'https://www.notjustalabel.com/products/womens',
def parse(self, response):