BeautifulSoup Div Class returns empty

2019-03-01 17:53发布

问题:

I checked similar questions, but could not find a solution...

I'm trying to scrape the minutes of extra travel time (46) from the following page: https://www.tomtom.com/en_gb/trafficindex/city/istanbul

I've tried by 2 methods (Xpath & find class), but both give an empty return.

import requests
from bs4 import BeautifulSoup
from lxml.html import fromstring

page = requests.get("https://www.tomtom.com/en_gb/trafficindex/city/istanbul")
tree = fromstring(page.content)

soup = BeautifulSoup(page.content, 'html.parser')



#print([type(item) for item in list(soup.children)])

html = list(soup.children)[2]

g_data = soup.find_all("div", {"class_": "big.ng-binding"})

congestion = tree.xpath("/html/body/div/div[2]/div[2]/div[2]/section[2]/div/div[2]/div/div[2]/div/div[2]/div[1]/div[1]/text()")
print(congestion)
print(len(g_data))

Am I missing something obvious?

Many thanks for helping out!

回答1:

Unfortunately BeautifulSoup alone is not enough to accomplish it. The website uses JavaScript to generate content so you will have to use additional tools like for example Selenium.

import bs4 as bs
import re
from selenium import webdriver

url = 'https://www.tomtom.com/en_gb/trafficindex/city/istanbul'

driver = webdriver.Firefox()
driver.get(url)           
html = driver.page_source
soup = bs.BeautifulSoup(html)

I can see two approaches to extract extra time:

1.Looking for div with class="text-big ng-binding".

div = soup.find_all('div', attrs={'class' : 'text-big ng-binding'})
result = div[0].text

2.Finding div containing Per day text first and then going two divs up

div = soup.find_all(text=re.compile('Per day'))
result = div.find_previous('div').find_previous('div').text