I am new to scrapy
and my task is simple:
For a given e-commerce website:
crawl all website pages
look for products page
If the URL point to a product page
Create an Item
Process the item to store it in a database
I created the spider but products are just printed in a simple file.
My question is about the project structure: how to use items in spider and how to send items to pipelines ?
I can't find a simple example of a project using items and pipelines.
Well, the main purpose of items is to store the data you crawled.
Scrapy.Items
are basically dictionnaries. To declare your items, you will have to create a class and addScrapy.Field
in it:You can now use it in your spider by importing your Product.
For advanced informations, I let you check the doc here
First you need to tell to your spider to use your
custom pipeline
.In the settings.py file:
You can now write your pipeline and play with your item.
In the pipeline.py file:
Finally, in your spider, you need to
yield
your item once it is filled.spider.py example:
Hope this helps, here is the doc for pipelines: Item Pipeline