添加链接
link管理
链接快照平台
  • 输入网页链接,自动生成快照
  • 标签化管理网页链接

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement . We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hello, When I try to load this website https://www.notjustalabel.com/products/womens I get this error:

2015-09-09 16:51:16 [scrapy] INFO: Spider opened
Traceback (most recent call last):
  File "/usr/local/bin/scrapy", line 11, in <module>
    sys.exit(execute())
  File "/home/victor/.local/lib/python2.7/site-packages/scrapy/cmdline.py", line 143, in execute
    _run_print_help(parser, _run_command, cmd, args, opts)
  File "/home/victor/.local/lib/python2.7/site-packages/scrapy/cmdline.py", line 89, in _run_print_help
    func(*a, **kw)
  File "/home/victor/.local/lib/python2.7/site-packages/scrapy/cmdline.py", line 150, in _run_command
    cmd.run(args, opts)
  File "/home/victor/.local/lib/python2.7/site-packages/scrapy/commands/shell.py", line 63, in run
    shell.start(url=url)
  File "/home/victor/.local/lib/python2.7/site-packages/scrapy/shell.py", line 44, in start
    self.fetch(url, spider)
  File "/home/victor/.local/lib/python2.7/site-packages/scrapy/shell.py", line 87, in fetch
    reactor, self._schedule, request, spider)
  File "/usr/local/lib/python2.7/dist-packages/twisted/internet/threads.py", line 122, in blockingCallFromThread
    result.raiseException()
  File "<string>", line 2, in raiseException
twisted.web._newclient.ResponseNeverReceived: [<twisted.python.failure.Failure OpenSSL.SSL.Error: [('SSL routines', 'SSL3_READ_BYTES', 'ssl handshake failure')]>]

I have also checked that TLSv1 can negociate this cert with curl and everything is fine

curl --tlsv1 -k https://www.notjustalabel.com/products/womens

This is my scrapy deployment:

➜  mencanta-spiders git:(master) ✗ scrapy version -v
2015-09-09 16:55:31 [scrapy] INFO: Scrapy 1.0.3 started (bot: mencanta)
2015-09-09 16:55:31 [scrapy] INFO: Optional features available: ssl, http11, boto
2015-09-09 16:55:31 [scrapy] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'mencanta.spiders', 'LOG_LEVEL': 'INFO', 'HTTPCACHE_EXPIRATION_SECS': 604800, 'HTTPCACHE_IGNORE_HTTP_CODES': [301, 302, 307, 403, 404, 401, 400, 402, 407, 500], 'SPIDER_MODULES': ['mencanta.spiders'], 'HTTPCACHE_ENABLED': True, 'BOT_NAME': 'mencanta', 'USER_AGENT': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.85 Safari/537.36'}
Scrapy  : 1.0.3
lxml    : 3.4.4.0
libxml2 : 2.9.2
Twisted : 15.4.0
Python  : 2.7.9 (default, Apr  2 2015, 15:33:21) - [GCC 4.9.2]
Platform: Linux-3.19.0-26-generic-x86_64-with-Ubuntu-15.04-vivid
          

Got it on current master branch:

$ scrapy shell https://www.notjustalabel.com/products/womens
2016-01-26 19:10:04 [scrapy] INFO: Scrapy 1.1.0dev1 started (bot: scrapybot)
2016-01-26 19:10:04 [scrapy] INFO: Overridden settings: {'LOGSTATS_INTERVAL': 0, 'DUPEFILTER_CLASS': 'scrapy.dupefilters.BaseDupeFilter'}
2016-01-26 19:10:04 [scrapy] INFO: Enabled extensions:
['scrapy.extensions.telnet.TelnetConsole',
 'scrapy.extensions.corestats.CoreStats']
2016-01-26 19:10:04 [scrapy] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
 'scrapy.downloadermiddlewares.retry.RetryMiddleware',
 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
 'scrapy.downloadermiddlewares.chunked.ChunkedTransferMiddleware',
 'scrapy.downloadermiddlewares.stats.DownloaderStats']
2016-01-26 19:10:04 [scrapy] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
 'scrapy.spidermiddlewares.referer.RefererMiddleware',
 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
 'scrapy.spidermiddlewares.depth.DepthMiddleware']
2016-01-26 19:10:04 [scrapy] INFO: Enabled item pipelines:
2016-01-26 19:10:04 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:6023
2016-01-26 19:10:04 [scrapy] INFO: Spider opened
2016-01-26 19:10:04 [scrapy] DEBUG: Retrying <GET https://www.notjustalabel.com/products/womens> (failed 1 times): [<twisted.python.failure.Failure OpenSSL.SSL.Error: [('SSL routines', 'ssl3_read_bytes', 'ssl handshake failure')]>]
2016-01-26 19:10:04 [scrapy] DEBUG: Retrying <GET https://www.notjustalabel.com/products/womens> (failed 2 times): [<twisted.python.failure.Failure OpenSSL.SSL.Error: [('SSL routines', 'ssl3_read_bytes', 'ssl handshake failure')]>]
2016-01-26 19:10:05 [scrapy] DEBUG: Gave up retrying <GET https://www.notjustalabel.com/products/womens> (failed 3 times): [<twisted.python.failure.Failure OpenSSL.SSL.Error: [('SSL routines', 'ssl3_read_bytes', 'ssl handshake failure')]>]
Traceback (most recent call last):
  File "/home/paul/.virtualenvs/scrapydev/bin/scrapy", line 9, in <module>
    load_entry_point('Scrapy', 'console_scripts', 'scrapy')()
  File "/home/paul/src/scrapy/scrapy/cmdline.py", line 142, in execute
    _run_print_help(parser, _run_command, cmd, args, opts)
  File "/home/paul/src/scrapy/scrapy/cmdline.py", line 88, in _run_print_help
    func(*a, **kw)
  File "/home/paul/src/scrapy/scrapy/cmdline.py", line 149, in _run_command
    cmd.run(args, opts)
  File "/home/paul/src/scrapy/scrapy/commands/shell.py", line 70, in run
    shell.start(url=url)
  File "/home/paul/src/scrapy/scrapy/shell.py", line 47, in start
    self.fetch(url, spider)
  File "/home/paul/src/scrapy/scrapy/shell.py", line 112, in fetch
    reactor, self._schedule, request, spider)
  File "/home/paul/.virtualenvs/scrapydev/local/lib/python2.7/site-packages/twisted/internet/threads.py", line 122, in blockingCallFromThread
    result.raiseException()
  File "<string>", line 2, in raiseException
twisted.web._newclient.ResponseNeverReceived: [<twisted.python.failure.Failure OpenSSL.SSL.Error: [('SSL routines', 'ssl3_read_bytes', 'ssl handshake failure')]>]
          

@VMRuiz , I don't know if you still have the issue, but with the trick from #1429 (comment)
(which I also tried successfully on #1764 (comment))

Would you mind testing it on your end?

Here's what I did:

Console:

$ scrapy crawl notjustlabel 
2016-02-09 18:03:09 [scrapy] INFO: Scrapy 1.0.5 started (bot: sslissues)
2016-02-09 18:03:09 [scrapy] INFO: Optional features available: ssl, http11
2016-02-09 18:03:09 [scrapy] INFO: Overridden settings:
{'NEWSPIDER_MODULE': 'sslissues.spiders',
 'SPIDER_MODULES': ['sslissues.spiders'],
 'DOWNLOADER_CLIENTCONTEXTFACTORY': 'sslissues.contextfactory.TLSFlexibleContextFactory',
 'BOT_NAME': 'sslissues'}
2016-02-09 18:03:10 [scrapy] DEBUG: Crawled (200) <GET https://www.notjustalabel.com/products/womens> (referer: None)
2016-02-09 18:03:10 [scrapy] INFO: Closing spider (finished)
2016-02-09 18:03:10 [scrapy] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 234,
 'downloader/request_count': 1,
 'downloader/request_method_count/GET': 1,
 'downloader/response_bytes': 13911,
 'downloader/response_count': 1,
 'downloader/response_status_count/200': 1,
 'finish_reason': 'finished',
 'finish_time': datetime.datetime(2016, 2, 9, 17, 3, 10, 127490),
 'log_count/DEBUG': 2,
 'log_count/INFO': 7,
 'response_received_count': 1,
 'scheduler/dequeued': 1,
 'scheduler/dequeued/memory': 1,
 'scheduler/enqueued': 1,
 'scheduler/enqueued/memory': 1,
 'start_time': datetime.datetime(2016, 2, 9, 17, 3, 9, 735698)}
2016-02-09 18:03:10 [scrapy] INFO: Spider closed (finished)

Project settings:

$ cat sslissues/settings.py
# -*- coding: utf-8 -*-
# Scrapy settings for sslissues project
BOT_NAME = 'sslissues'
SPIDER_MODULES = ['sslissues.spiders']
NEWSPIDER_MODULE = 'sslissues.spiders'
DOWNLOADER_CLIENTCONTEXTFACTORY = 'sslissues.contextfactory.TLSFlexibleContextFactory'

Custom context factory:

$ cat sslissues/contextfactory.py
from OpenSSL import SSL
from scrapy.core.downloader.contextfactory import ScrapyClientContextFactory
class TLSFlexibleContextFactory(ScrapyClientContextFactory):
    """A more protocol flexible TLS/SSL context factory.
    A TLS/SSL connection established with these methods may understand
    the SSLv3, TLSv1, TLSv1.1 and TLSv1.2 protocols.
    See https://www.openssl.org/docs/manmaster/ssl/SSL_CTX_new.html
    def __init__(self):
        self.method = SSL.SSLv23_METHOD

Spider code:

$ cat sslissues/spiders/notjustlabel.py
# -*- coding: utf-8 -*-
import scrapy
class NotjustlabelSpider(scrapy.Spider):
    name = "notjustlabel"
    allowed_domains = ["notjustalabel.com"]
    start_urls = (
        'https://www.notjustalabel.com/products/womens',
    def parse(self, response):