Scrapy - Why Item Inside For Loop Has The Same Val

2019-09-14 06:35发布

I want to scrape the link inside the for loop, in for loop there are items, I passed the item to the callback function. But why the item in the callback function has the same value. This is my code.

import scrapy
import re
from scraper.product_items import Product

class ProductSpider(scrapy.Spider):
    name = "productspider"

    start_urls = [
        'http://www.website.com/category-page/',
    ]

    def parse(self, response):
        item = Product()
        for products in response.css("div.product-card"):
            link = products.css("a::attr(href)").extract_first()
            item['sku'] = products.css("div.product-card::attr(data-sku)").extract_first()
            item['price'] = products.css("div.product-card__old-price::text").extract_first()
            yield scrapy.Request(url = link, callback=self.parse_product_page, meta={'item': item})

    def parse_product_page(self, response):
        item = response.meta['item']
        item['image'] = response.css("div.productImage::attr(data-big)").extract_first()
        return item

The result is this.

[
{"sku": "DI684OTAA55INNANID", "price": "725", "image": "http://website.com/image1.jpg"},
{"sku": "DI684OTAA55INNANID", "price": "725", "image": "http://website.com/image2.jpg"},
{"sku": "DI684OTAA55INNANID", "price": "725", "image": "http://website.com/image3.jpg"},
]

As you can see, the sku and price has the same value for each iteration. I want the result of the sku and price different. If I get the result of the self parse, change the code like this.

import scrapy
import re
from scraper.product_items import Product

class LazadaSpider(scrapy.Spider):
    name = "lazada"

    start_urls = [
        'http://www.lazada.co.id/beli-jam-tangan-kasual-pria/',
    ]

    def parse(self, response):
        item = Product()
        for products in response.css("div.product-card"):
            link = products.css("a::attr(href)").extract_first()
            item['sku'] = products.css("div.product-card::attr(data-sku)").extract_first()
            item['price'] = products.css("div.product-card__old-price::text").extract_first()
            yield item

Then the value of sku and price is correct for each iteration.

[
{"sku": "CA199FA31FKAANID", "price": "299"},
{"sku": "SW437OTAA31QO3ANID", "price": "200"},
{"sku": "SW437OTAM1RAANID", "price": "235"},
]

标签: scrapy
1条回答
爷、活的狠高调
2楼-- · 2019-09-14 07:07

You should create item inside for loop, otherwise you just share same item between all the iterations repopulating its values only. So correct code is:

def parse(self, response):
    for products in response.css("div.product-card"):
        item = Product()
        link = products.css("a::attr(href)").extract_first()
        item['sku'] = products.css("div.product-card::attr(data-sku)").extract_first()
        item['price'] = products.css("div.product-card__old-price::text").extract_first()
        yield item
查看更多
登录 后发表回答