This is my code
def parse(self, response):
soup = BeautifulSoup(response.body)
hxs = HtmlXPathSelector(response)
sites = hxs.select('//div[@class="row"]')
items = []
for site in sites[:5]:
item = TestItem()
item['username'] = "test5"
request = Request("http://www.example.org/profile.php", callback = self.parseUserProfile)
request.meta['item'] = item
**yield item**
mylinks= soup.find_all("a", text="Next")
if mylinks:
nextlink = mylinks[0].get('href')
yield Request(urljoin(response.url, nextlink), callback=self.parse)
def parseUserProfile(self, response):
item = response.meta['item']
item['image_urls'] = "test3"
return item
Now my above works but with that i am not getting value of item['image_urls'] = "test3"
It is coming as null
Now if use return request
instead of yield item
Then get error that cannot use return with generator
If i remove this line
yield Request(urljoin(response.url, nextlink), callback=self.parse)
Then my code works fine and i can get image_urls
but then i canot follow the links
So is there any way so that i can use return request
and yield together
so that i get the item_urls
I don't really understand your issue, but i see one problem in your code:
Parse callbacks return values should be sequences, so you should do
return [item]
or convert your callback into a generator:Looks like you have a mechanical error. Instead of:
You need: