How to scrape data using next button with ellipsis

I need to continuously get the data on next button <1 2 3 ... 5> but there's no provided href link in the source also there's also elipsis. any idea please? here's my code

def start_requests(self):
    urls = (
        (self.parse_2, 'https://www.forever21.com/us/shop/catalog/category/f21/sale'),
    )
    for cb, url in urls:
        yield scrapy.Request(url, callback=cb)


def parse_2(self, response):
    for product_item_forever in response.css('div.pi_container'):
        forever_item = {
            'forever-title': product_item_forever.css('p.p_name::text').extract_first(),
            'forever-regular-price': product_item_forever.css('span.p_old_price::text').extract_first(),
            'forever-sale-price': product_item_forever.css('span.p_sale.t_pink::text').extract_first(),
            'forever-photo-url': product_item_forever.css('img::attr(data-original)').extract_first(),
            'forever-description-url': product_item_forever.css('a.item_slider.product_link::attr(href)').extract_first(),
        }
        yield forever_item

Please help me thank you

标签： python web-scraping scrapy

2条回答

我命由我不由天

2楼-- · 2019-08-27 17:27

It seems this pagination uses additional request to API. So, there are two ways:

Use Splash/Selenium to render pages by pattern of QHarr;
Make same calls to API. Check developer tools, you will find POST-request https://www.forever21.com/us/shop/Catalog/GetProducts will all proper params (they are too long, so I will not post full list here).

0人赞添加讨论(0) 举报

三岁会撩人

3楼-- · 2019-08-27 17:44

The url changes so you can specify page number and results per page in the url e.g.

https://www.forever21.com/uk/shop/catalog/category/f21/sale/#pageno=2&pageSize=120&filter=price:0,250

As mentioned by @vezunchik and OP feedback, this approach requires selenium/splash to allow js to run on the page. If you were going down that route you could just click the next ( .p_next) until you get the end page as it is easy to grab the last page number (.dot + .pageno)from the document.

I appreciate you are trying with scrapy.

Demo of the idea with selenium in case helps.

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

url_loop = 'https://www.forever21.com/uk/shop/catalog/category/f21/sale/#pageno={}&pageSize=120&filter=price:0,250'
url = 'https://www.forever21.com/uk/shop/catalog/category/f21/sale'
d = webdriver.Chrome()
d.get(url)

d.find_element_by_css_selector('[onclick="fnAcceptCookieUse()"]').click() #get rid of cookies
items =  WebDriverWait(d,10).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "#products .p_item")))
d.find_element_by_css_selector('.selectedpagesize').click()
d.find_elements_by_css_selector('.pagesize')[-1].click() #set page result count to 120
last_page = int(d.find_element_by_css_selector('.dot + .pageno').text) #get last page

if last_page > 1:
    for page in range(2, last_page + 1):
        url = url_loop.format(page)
        d.get(url)
        try:
            d.find_element_by_css_selector('[type=reset]').click() #reject offer
        except:
            pass
        # do something with page
        break #delete later

0人赞添加讨论(0) 举报

How to scrape data using next button with ellipsis

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间