When I try to scrape a certain web site (with both, spider and shell), I get the following error:
twisted.web._newclient.ResponseNeverReceived: [<twisted.python.failure.Failure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion.>]
I found out that this can happen, when no user agent is set. But after setting it manually, I still got the same error.
You can see the whole output of scrapy shell here: http://pastebin.com/ZFJZ2UXe
Notes:
I am not behind a proxy, and I can access other sites via scrapy shell without problems. I am also able to access the site with Chrome, so it is not a network or connection issue.
Maybe someone can give me a hint how I could solve this problem?
Here is 100% working code.
What you need to do is you have to send request headers as well.
Also set
ROBOTSTXT_OBEY = False
insettings.py
EDIT:
You can see what headers to send by inspecting the URLs in Dev Tools