Twisted.python.failure.Failure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion:
2015-12-10 17:47:57 [scrapy] INFO: Spider opened
2015-12-10 17:47:58 [scrapy] DEBUG: Retrying <GET https://www.modelmayhem.com/> (failed 1 times): [<twisted.python.failure.Failure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion: Connection lost.>]
2015-12-10 17:47:58 [scrapy] DEBUG: Retrying <GET https://www.modelmayhem.com/> (failed 2 times): [<twisted.python.failure.Failure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion: Connection lost.>]
2015-12-10 17:47:59 [scrapy] DEBUG: Gave up retrying <GET https://www.modelmayhem.com/> (failed 3 times): [<twisted.python.failure.Failure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion: Connection lost.>]
Traceback (most recent call last):
File "/Users/billyfung/anaconda/bin/scrapy", line 11, in <module>
sys.exit(execute())
File "/Users/billyfung/anaconda/lib/python2.7/site-packages/scrapy/cmdline.py", line 143, in execute
_run_print_help(parser, _run_command, cmd, args, opts)
File "/Users/billyfung/anaconda/lib/python2.7/site-packages/scrapy/cmdline.py", line 89, in _run_print_help
func(*a, **kw)
File "/Users/billyfung/anaconda/lib/python2.7/site-packages/scrapy/cmdline.py", line 150, in _run_command
cmd.run(args, opts)
File "/Users/billyfung/anaconda/lib/python2.7/site-packages/scrapy/commands/shell.py", line 63, in run
shell.start(url=url)
File "/Users/billyfung/anaconda/lib/python2.7/site-packages/scrapy/shell.py", line 44, in start
self.fetch(url, spider)
File "/Users/billyfung/anaconda/lib/python2.7/site-packages/scrapy/shell.py", line 87, in fetch
reactor, self._schedule, request, spider)
File "/Users/billyfung/anaconda/lib/python2.7/site-packages/twisted/internet/threads.py", line 122, in blockingCallFromThread
result.raiseException()
File "<string>", line 2, in raiseException
twisted.web._newclient.ResponseNeverReceived: [<twisted.python.failure.Failure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion: Connection lost.>]
Not sure what this is except that when I load the site in my browser it's pretty slow to load up as well.
The only solution to other problem you are facing is that you have to use a specific User-Agent as an argument with scrapy shell website
So, you have to do something like scrapy shell -s USER_AGENT="custom-user-agent" website
For ex:
scrapy shell -s USER_AGENT="Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.93 Safari/537.36" 'http://www.modelmayhem.com'
Works well. You can try adding other user agent also.
Me too, even setting user-agent on settings.py
# Crawl responsibly by identifying yourself (and your website) on the user-agent
USER_AGENT = 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:40.0) Gecko/20100101 Firefox/40.1'