I'd been trying to scrape some date from as asp.net website, the start page should be the following one: http://www.e3050.com/Items.aspx?cat=SON
First, I want to display 50 item per page (from the select element) Second, I want to paginate through pages.
I tried the following code for 50 items per page, but didn't work:
start_urls = ["http://www.e3050.com/Items.aspx?cat=SON"]
def parse(self, response):
requests = []
hxs = HtmlXPathSelector(response)
# Check if there's more than 1 page
if len(hxs.select('//span[@id="ctl00_ctl00_ContentPlaceHolder1_ItemListPlaceHolder_lbl_PageSize"]/text()').extract()) > 0:
# Get last page number
last_page = hxs.select('//span[@id="ctl00_ctl00_ContentPlaceHolder1_ItemListPlaceHolder_lbl_PageSize"]/text()').extract()[0]
i = 1
# preparing requests for each page
while i < (int(last_page) / 5) + 1:
requests.append(Request("http://www.e3050.com/Items.aspx?cat=SON", callback=self.parse_product))
i +=1
# posting form date (50 items and next page button)
requests.append(FormRequest.from_response(
response,
formdata={'ctl00$ctl00$ContentPlaceHolder1$ItemListPlaceHolder$pagesddl':'50',
'__EVENTTARGET':'ctl00$ctl00$ContentPlaceHolder1$ItemListPlaceHolder$pager1$ctl00$ctl01'},
callback=self.parse_product,
dont_click=True
)
)
for request in requests:
yield request
Check out this here is an exact solution..
in parse method selecting 50 products per page
in page_rs_50 handled pagination
I did not extensively research your code, but i see something strange:
First, instead of these manipulations with
i
, you can do:Then you do:
Are you creating many requests to the same URL?