I'm trying to override default path full/hash.jpg
to <dynamic>/hash.jpg
, I've tried How to download scrapy images in a dyanmic folder using following code:
def item_completed(self, results, item, info):
for result in [x for ok, x in results if ok]:
path = result['path']
# here we create the session-path where the files should be in the end
# you'll have to change this path creation depending on your needs
slug = slugify(item['category'])
target_path = os.path.join(slug, os.path.basename(path))
# try to move the file and raise exception if not possible
if not os.rename(path, target_path):
raise DropItem("Could not move image to target folder")
if self.IMAGES_RESULT_FIELD in item.fields:
item[self.IMAGES_RESULT_FIELD] = [x for ok, x in results if ok]
return item
but I get:
Traceback (most recent call last):
File "/home/user/.venv/sepid/lib/python2.7/site-packages/twisted/internet/defer.py", line 577, in _runCallbacks
current.result = callback(current.result, *args, **kw)
File "/home/user/.venv/sepid/lib/python2.7/site-packages/twisted/internet/defer.py", line 839, in _cbDeferred
self.callback(self.resultList)
File "/home/user/.venv/sepid/lib/python2.7/site-packages/twisted/internet/defer.py", line 382, in callback
self._startRunCallbacks(result)
File "/home/user/.venv/sepid/lib/python2.7/site-packages/twisted/internet/defer.py", line 490, in _startRunCallbacks
self._runCallbacks()
--- <exception caught here> ---
File "/home/user/.venv/sepid/lib/python2.7/site-packages/twisted/internet/defer.py", line 577, in _runCallbacks
current.result = callback(current.result, *args, **kw)
File "/home/user/Projects/sepid/scraper/scraper/pipelines.py", line 44, in item_completed
if not os.rename(path, target_path):
exceptions.OSError: [Errno 2] No such file or directory
I don't know what's wrong, also is there any other way to change the path? Thanks
To dynamically set the path for images downloaded by a scrapy spider prior to downloading images rather than moving them afterward, I created a custom pipeline overriding the
get_media_requests
andfile_path
methods.This approach assumes you define a
scrapy.Item
in your spider and replace, e.g., "field1" with your particular field name. Setting Request.meta inget_media_requests
allows item field values to be used in setting download directories for each item, as shown in the return statement forfile_path
. Scrapy will create the directories automatically if they don't exist.Custom pipeline class definitions are saved in my project's
pipelines.py
. Methods here are adapted directly from the default scrapy pipelineimages.py
, which on my Mac is stored in~/anaconda3/pkgs/scrapy-1.5.0-py36_0/lib/python3.6/site-packages/scrapy/pipelines/
. Includes and additional methods can be copied from that file as needed.I have created a pipeline inherited from
ImagesPipeline
and overriddenfile_path
method and used it instead of standardImagesPipeline
Problem raises because dst folder doesn't exists, and quick solution is:
the solution that @neelix give is the best one , but i'm trying to use it and i found some strange results , some documents are moved but not all the documents. So i replaced :
and i imported shutil library , then my code is :
i hope that it will work also for u guys :)