I'm very newbie,I am working with scrapy in a web that use cookies, This is a problem for me , because I can obtain data the a web without cookies but obtain the data of a web with cookies is dificult for me. I have this code structure
class mySpider(BaseSpider):
name='data'
allowed_domains =[]
start_urls =["http://...."]
def parse(self, response):
sel = HtmlXPathSelector(response)
items = sel.xpath('//*[@id=..............')
vlrs =[]
for item in items:
myItem['img'] = item.xpath('....').extract()
yield myItem
This is fine, I can obtain fine the data without cookies using this code structure I found it as I can work with cookies, in this url, but I do not understand where I should put this code to then be able to get the data using xpath
I'm testing this code
request_with_cookies = Request(url="http://...",cookies={'country': 'UY'})
but I don't know as I can work or where put this code, I put this code into the function parse, for obtain the data
def parse(self, response):
request_with_cookies = Request(url="http://.....",cookies={'country':'UY'})
sel = HtmlXPathSelector(request_with_cookies)
print request_with_cookies
I try of use XPath with this new url with cookies , for later print this new data scraping I thought it was like working with an url without cookies but when I run this I have a mistake because 'Request' object has no attribute 'body_as_unicode' What would be the proper way to work with these cookies, I'm a little lost Thank you very much.
You are very close! The contract for the parse() method is that it
yield
s (or returns an iterable) ofItem
s,Request
s, or a mix of both. In your case, all you should have to do isand your parse() method will be run again with a
Response
object produced from requesting that URL with those cookies.http://doc.scrapy.org/en/latest/topics/spiders.html?highlight=parse#scrapy.spider.Spider.parse http://doc.scrapy.org/en/latest/topics/request-response.html