I need to continuously get the data on next button <1 2 3 ... 5> but there's no provided href link in the source also there's also elipsis. any idea please? here's my code
def start_requests(self):
urls = (
(self.parse_2, 'https://www.forever21.com/us/shop/catalog/category/f21/sale'),
)
for cb, url in urls:
yield scrapy.Request(url, callback=cb)
def parse_2(self, response):
for product_item_forever in response.css('div.pi_container'):
forever_item = {
'forever-title': product_item_forever.css('p.p_name::text').extract_first(),
'forever-regular-price': product_item_forever.css('span.p_old_price::text').extract_first(),
'forever-sale-price': product_item_forever.css('span.p_sale.t_pink::text').extract_first(),
'forever-photo-url': product_item_forever.css('img::attr(data-original)').extract_first(),
'forever-description-url': product_item_forever.css('a.item_slider.product_link::attr(href)').extract_first(),
}
yield forever_item
Please help me thank you
It seems this pagination uses additional request to API. So, there are two ways:
https://www.forever21.com/us/shop/Catalog/GetProducts
will all proper params (they are too long, so I will not post full list here).The url changes so you can specify page number and results per page in the url e.g.
As mentioned by @vezunchik and OP feedback, this approach requires selenium/splash to allow js to run on the page. If you were going down that route you could just click the next (
.p_next
) until you get the end page as it is easy to grab the last page number (.dot + .pageno
)from the document.I appreciate you are trying with scrapy.
Demo of the idea with selenium in case helps.