I see all questions here, but i dont understand yet.
Actualy with de code bellow i do what i need, except rename de image, so i try change name in the items.py
file, pls check comments inside.
settings.py
SPIDER_MODULES = ['xxx.spiders']
NEWSPIDER_MODULE = 'xxx.spiders'
ITEM_PIPELINES = {'scrapy.pipelines.images.ImagesPipeline': 1}
IMAGES_STORE = '/home/magicnt/xxx/images'
items.py
class XxxItem(scrapy.Item):
# define the fields for your item here like:
# name = scrapy.Field()
title = scrapy.Field()
image_urls = scrapy.Field()
#images = scrapy.Field()<---with that code work with default name images
images = title<--- I try rename here, but not work
spider.py
from xxx.items import XxxItem
import scrapy
from scrapy.pipelines.images import ImagesPipeline
from scrapy.exceptions import DropItem
class CoverSpider(scrapy.Spider):
name = "pyimagesearch-cover-spider"
start_urls = ['https://xxx.com.br/product']
def parse(self, response):
for bimb in response.css('#mod_imoveis_result'):
imageURL = bimb.xpath('./div[@id="g-img-imo"]/div[@class="img_p_results"]/img/@src').extract_first()
title = bimb.css('#titulo_imovel::text').extract_first()
yield {
'image_urls' : [response.urljoin(imageURL)],
'title' : title
}
next_page = response.xpath('//a[contains(@class, "num_pages") and contains(@class, "pg_number_next")]/@href').extract_first()
yield response.follow(next_page, self.parse)
My goal is rename downloaded images with the title from item. Any tip for this goal are welcome.
I'm totally new to python and oo, I usually scrape with structural php but realize what a good scrapy it can be, ask for a little patience and help.
My code is based on Scrapy Image Pipeline: How to rename images? I tested it a week ago and it works on my own spiders.
Here is how the
ImagePipeline
works:The pipeline will execute
image_downloaded
->get_images
->file_path
in order. ("->" means invokes)image_downloaded
: save images thatget_images
return by invokingpersist_file
get_images
: convert images to JPEGfile_path
: return the relative path of imageI scaned through the source code of ImagePipeline and found no special field for rename an image. Scrapy will rename it in this way:
Therefore we should override method
file_path
. According to the source code of FilePipeline which ImagePipeline inherits, we only need to return relative paths andpersist_file
will get things done.