Scrapy not working with return and yield together

2019-04-16 01:07发布

This is my code

def parse(self, response):
    soup = BeautifulSoup(response.body)
    hxs = HtmlXPathSelector(response)
    sites = hxs.select('//div[@class="row"]')
    items = []

    for site in sites[:5]:
        item = TestItem()
        item['username'] = "test5"
        request =  Request("http://www.example.org/profile.php",  callback = self.parseUserProfile)
        request.meta['item'] = item
        **yield item**

    mylinks= soup.find_all("a", text="Next")
    if mylinks:
        nextlink = mylinks[0].get('href')
        yield Request(urljoin(response.url, nextlink), callback=self.parse)

def parseUserProfile(self, response):
    item = response.meta['item']
    item['image_urls'] = "test3"
    return item

Now my above works but with that i am not getting value of item['image_urls'] = "test3"

It is coming as null

Now if use return request instead of yield item

Then get error that cannot use return with generator

If i remove this line

yield Request(urljoin(response.url, nextlink), callback=self.parse) Then my code works fine and i can get image_urls but then i canot follow the links

So is there any way so that i can use return request and yield together so that i get the item_urls

标签: python scrapy
2条回答
萌系小妹纸
2楼-- · 2019-04-16 01:29

I don't really understand your issue, but i see one problem in your code:

def parseUserProfile(self, response):
    item = response.meta['item']
    item['image_urls'] = "test3"
    return item

Parse callbacks return values should be sequences, so you should do return [item] or convert your callback into a generator:

def parseUserProfile(self, response):
    item = response.meta['item']
    item['image_urls'] = "test3"
    yield item
查看更多
贼婆χ
3楼-- · 2019-04-16 01:29

Looks like you have a mechanical error. Instead of:

for site in sites[:5]:
    item = TestItem()
    item['username'] = "test5"
    request =  Request("http://www.example.org/profile.php",  callback = self.parseUserProfile)
    request.meta['item'] = item
    **yield item**

You need:

for site in sites[:5]:
    item = TestItem()
    item['username'] = "test5"
    request =  Request("http://www.example.org/profile.php",  callback = self.parseUserProfile)
    request.meta['item'] = item
    yield request
查看更多
登录 后发表回答