For my scrapy project I'm currently using the ImagesPipeline. The downloaded images are stored with a SHA1 hash of their URLs as the file names.
How can I store the files using my own custom file names instead?
What if my custom file name needs to contain another scraped field from the same item? e.g. use the item['desc']
and the filename for the image with item['image_url']
. If I understand correctly, that would involve somehow accessing the other item fields from the Image Pipeline.
Any help will be appreciated.
This is just actualization of the answer for scrapy 0.24 (EDITED), where the
image_key()
is deprecatedThis was the way I solved the problem in Scrapy 0.10 . Check the method persist_image of FSImagesStoreChangeableDirectory. The filename of the downloaded image is key
In scrapy 0.12 I solved something like this
I found my way in 2017,scrapy 1.1.3
like the code above,you can add the name you want to a Request meta in
get_media_requests()
, and get it back infile_path()
byrequest.meta.get('yourname','')
.I did a nasty quick hack for that. In my case, I stored the title of image in my feeds. And, I had only 1
image_urls
per item, so, I wrote the following script. It basically renames the image files in the/images/full/
directory with the corresponding title in the item feed that I had stored in as json.It's nasty & not recommended. But, it is a naive alternative approach.
I rewrite the code, changing, in thumb_path def, "response." by "request.". If no, it won't work because "response is set to None".