I want to scrape the link inside the for loop, in for loop there are items, I passed the item to the callback function. But why the item in the callback function has the same value. This is my code.
import scrapy
import re
from scraper.product_items import Product
class ProductSpider(scrapy.Spider):
name = "productspider"
start_urls = [
'http://www.website.com/category-page/',
]
def parse(self, response):
item = Product()
for products in response.css("div.product-card"):
link = products.css("a::attr(href)").extract_first()
item['sku'] = products.css("div.product-card::attr(data-sku)").extract_first()
item['price'] = products.css("div.product-card__old-price::text").extract_first()
yield scrapy.Request(url = link, callback=self.parse_product_page, meta={'item': item})
def parse_product_page(self, response):
item = response.meta['item']
item['image'] = response.css("div.productImage::attr(data-big)").extract_first()
return item
The result is this.
[
{"sku": "DI684OTAA55INNANID", "price": "725", "image": "http://website.com/image1.jpg"},
{"sku": "DI684OTAA55INNANID", "price": "725", "image": "http://website.com/image2.jpg"},
{"sku": "DI684OTAA55INNANID", "price": "725", "image": "http://website.com/image3.jpg"},
]
As you can see, the sku and price has the same value for each iteration. I want the result of the sku and price different. If I get the result of the self parse, change the code like this.
import scrapy
import re
from scraper.product_items import Product
class LazadaSpider(scrapy.Spider):
name = "lazada"
start_urls = [
'http://www.lazada.co.id/beli-jam-tangan-kasual-pria/',
]
def parse(self, response):
item = Product()
for products in response.css("div.product-card"):
link = products.css("a::attr(href)").extract_first()
item['sku'] = products.css("div.product-card::attr(data-sku)").extract_first()
item['price'] = products.css("div.product-card__old-price::text").extract_first()
yield item
Then the value of sku and price is correct for each iteration.
[
{"sku": "CA199FA31FKAANID", "price": "299"},
{"sku": "SW437OTAA31QO3ANID", "price": "200"},
{"sku": "SW437OTAM1RAANID", "price": "235"},
]
You should create item inside
for
loop, otherwise you just share same item between all the iterations repopulating its values only. So correct code is: