Scrapy pipeline error cannot import name

2019-08-28 03:00发布

I am new to python programming and using scrapy. I have setup my crawler and so far it was working until I got to the point where I wanted to figure out how to download images. The error I am getting is cannot import name NsiscrapePipeline. I dont know what I am doing wrong and I dont understand some of the documentation as I am new. Please help

Items File

from scrapy.item import Item, Field

class NsiscrapeItem(Item):
    # define the fields for your item here like:
    # name = Field()
    location = Field()
    stock_number = Field()
    year = Field()
    manufacturer = Field()
    model = Field()
    length = Field()
    price = Field()
    status = Field()
    url = Field()

    pass

Spider

from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from NSIscrape.items import NsiscrapeItem
from scrapy.http import Request
from scrapy.contrib.pipeline.images import NsiscrapePipeline
import Image

class NsiscrapeSpider(BaseSpider):
    name = "Nsiscrape"
    allowed_domain = ["yachtauctions.com"]
    start_urls = [
    "http://www.yachtauctions.com/inventory/"
    ]

    def parse(self, response):
    hxs = HtmlXPathSelector(response)
    sites = hxs.select('//tr')
    items = []
    for site in sites:
    item = NsiscrapeItem()
    item['location'] = site.select('td[2]/text()').extract()
    item['stock_number'] = site.select('td[3]/a/text()').extract()
    item['year'] = site.select('td[4]/text()').extract()
    item['manufacturer'] = site.select('td[5]/text()').extract()
    item['model'] = site.select('td[6]/text()').extract()
    item['length'] = site.select('td[7]/text()').extract()
    item['price'] = site.select('td[8]/text()').extract()
    item['status'] = site.select('td[10]/img/@src').extract()
    item['url'] = site.select('td[1]/a/@href').extract()
    item['image_urls'] = site.select('td/a[3]/img/@data-original').extract()
        item['images'] = item['image_urls']
        yield Request(item['url'][0], meta={'item':item}, callback=self.product_detail_page)


    def product_detail_page(self, response):
    hxs = HtmlXPathSelector(response)
    item = response.request.meta['item']
    #add all images url in the item['image_urls']
    yield item

settings

ITEM_PIPELINES = ['scrapy.contrib.pipeline.image.NsiscrapePipeline']
IMAGES_STORE = 'c:\Python27\NSIscrape\IMG'
IMAGES_EXPIRES = 90

Pipelines This is where I am unsure if I am missing something

from scrapy.item import Item 

class NsiscrapePipeline(Item):
image_urls = Field()
    images = Field()

    def process_item(self, item, spider):
        return item

error

File "NSIscrape\spiders\NSI_Spider.py", line 9, in <module>
from scrapy.contrib.pipeline.images import NsiscrapePipeline
ImportError: cannot import name NsiscrapePipeline

标签: python scrapy
3条回答
【Aperson】
2楼-- · 2019-08-28 03:30

That isn't part of the library :) - at least by looking at their current master branch

I think you're looking for ImagesPipeline

Their example may help! example

p.s. I don't think you custom name the class - at least not by how scapy is designed; i'm reasonably sure you use their class ;)


查看更多
唯我独甜
3楼-- · 2019-08-28 03:34

Heres my final code thats working. There was two issues

1: I was missing the second backslash that needede to be in the request --> //td[1]/a[3]/img/@data-original

2: I had to check the full URL in which the image would be displayed and join them together which was the main URL or the allowed URL and the image URL.

def parse(self, response):
    hxs = HtmlXPathSelector(response)
    images = hxs.select('//tr')
    url = []
    for image in images:
        urls = NsiscrapeItem()
        urls['image_urls'] = ["http://www.yachtauctions.com" +  x for x in image.select('//td[1]/a[3]/img/@data-original').extract()]
        url.append(urls)
    return url
查看更多
唯我独甜
4楼-- · 2019-08-28 03:38

You tried to pass list, but this function accepts only string. Pass only one element from list (for example list[0]).

查看更多
登录 后发表回答