Scraping multiple pages with Python repeats only t

I am trying to scrape this page https://www.anesishome.gr/%CE%B2%CF%81%CE%B5%CF%86%CE%B9%CE%BA%CE%AC-159#!/ I need the name and price of each product for the first 5 pages. The problem is tha my code gives the results of the first page 5 times. As if I dont't change the url for the next pages. What am I doing wrong? Thank you!

from urllib.request import urlopen
from bs4 import BeautifulSoup
for i in range(5):
    page="https://www.anesishome.gr/%CE%B2%CF%81%CE%B5%CF%86%CE%B9%CE%BA%CE%AC-159#!/page-{}".format(i)
    html = urlopen(page)
    soup=BeautifulSoup(html, "html.parser")
    pin=[None]*240
    puk=[None]*240
    k=soup.find("ul", class_="product-grid row")
    titles=k.find_all("a", class_="product_image")
    i=0
    for title in titles:
        pin[i]=title.get("title")
        i=i+1  
    t=soup.find("ul", class_="product-grid row")
    prices=t.find_all("span", class_="price")
    i=0
    for price in prices:
        puk[i]=price.get_text()
        i=i+1
    x=0
    with open('vrefika.txt', 'w') as f:
        for x in range(0,i):
            print(pin[x])
            print("price=",puk[x])
            string=pin[x]
            f.write(string+"\n")
            string=puk[x]
            f.write(string+"\n")

标签： python beautifulsoup urllib scrape

2条回答

戒情不戒烟

2楼-- · 2019-07-14 06:18

This page uses JavaScript/AJAX to load next pages - and this AJAX uses url

https://www.anesishome.gr/modules/blocklayered/blocklayered-ajax.php?id_category_layered=159&layered_price_slider=2_350&orderby=price&orderway=asc&n=48&p=2&_=1486296430052

p=2 is page number.

Result is JSON string

{"filtersBlock":"<script type=\"text\/javascript\">\ncurrent_friendly_url = '#!\/page-2';\nparam_product_url

but maybe you you could use string slicing to get HTML with using JSON module/parser.

0人赞添加讨论(0) 举报

霸刀☆藐视天下

3楼-- · 2019-07-14 06:19

I made a demo for you, hope this will help:

import requests
from bs4 import BeautifulSoup
for i in range(1, 6):  # page start at 1, end with 5 
    page="https://www.anesishome.gr/%CE%B2%CF%81%CE%B5%CF%86%CE%B9%CE%BA%CE%AC-159?p={}".format(i)
    html = requests.get(page)
    soup = BeautifulSoup(html.text, 'lxml')
    product_list = soup.find('div', id="product_list")  # get product_list

    for item in product_list('li', class_='ajax_block_product'): # itertate over each product
        title = item.find('h5').text
        price = item.find(class_="price").text
        print(title, price)

out:

ΠΕΤΣΕΤΑ ΒΡΕΦΙΚΗ NEF - NEF HAPPY DAY MINT 40X60 2,66 €
ΠΑΙΔΙΚΗ ΠΕΤΣΕΤΑ ΧΕΡΙΩΝ NEF - NEF SNOOPY FRIENDS 3,50 €
Πετσετάκια ώμου σε σετ 3 τεμαχίων Pierre Cardin 4,80 €
Σαλιάρα Pierre Cardin 74 5,60 €
Σαλιάρα Pierre Cardin 135 5,60 €
ΒΡΕΦΙΚΟ ΜΑΞΙΛΑΡΙ ΜΑΛΑΚΟ NEF-NEF BALLFIBER 6,75 €
ΣΑΛΙΑΡΑ NEF-NEF TINY FRIENDS 7,20 €
ΣΑΛΙΑΡΑ NEF-NEF PLAY NOW 7,20 €
Pierre Cardin baby design 035 Πάνες Φανελένιες 7,20 €
Pierre Cardin baby design 036 Πάνες Φανελένιες 7,20 €
Baby Oliver design 212 Πάνες χασέ 7,20 €

0人赞添加讨论(0) 举报

Scraping multiple pages with Python repeats only t

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间