How can I parse table data from website using Sele

Im trying to parse the table present in the [website][1]

[1]: http://www.espncricinfo.com/rankings/content/page/211270.html using selenium, as I am beginner . i'm struggling to do that here is my code

from bs4 import BeautifulSoup
import time
from selenium import webdriver

url = "http://www.espncricinfo.com/rankings/content/page/211270.html"
browser = webdriver.Chrome()

browser.get(url)
time.sleep(3)
html = browser.page_source
soup = BeautifulSoup(html, "lxml")

print(len(soup.find_all("table")))
print(soup.find("table", {"class": "expanded_standings"}))

browser.close()
browser.quit()

that I tried, I'm unable to fetch anything from this, any suggestions will be really helpful thanks

标签： python python-3.x selenium parsing web-scraping

2条回答

Rolldiameter

2楼-- · 2019-08-26 00:36

It looks like that page's tables are within iframes. If you have a specific table you want to scrape, try inspecting it using browser dev tools (right click, inspect element in Chrome) and find the iframe element that is wrapping it. The iframe should have a src attribute that holds a url to the page that actually contains that table. You can then use a similar method to the one you tried but instead use the src url.

Selenium can also "jump into" an iframe if you know how to find the iframe in the page's source code. frame = browser.find_element_by_id("the_iframe_id") browser.switch_to.frame(frame) html = browser.page_source etc

0人赞添加讨论(0) 举报

霸刀☆藐视天下

3楼-- · 2019-08-26 00:45

The table you are after is within an iframe. So, to get the data from that table you need to switch that iframe first and then do the rest. Here is one way you could do it:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Chrome()
driver.get("http://www.espncricinfo.com/rankings/content/page/211270.html")
wait = WebDriverWait(driver, 10)
 ## if any different table you expect to have then just change the index number within nth-of-type()
 ## and the appropriate name in the selector
wait.until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR, "iframe[name='testbat']:nth-of-type(1)")))
for table in wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "table tr")))[1:]:
    data = [item.text for item in table.find_elements_by_css_selector("th,td")]
    print(data)
driver.quit()

And the best approach would be in this very case is as follows. No browser simulator is used. Only requests and BeautifulSoup have been used:

import requests
from bs4 import BeautifulSoup

res = requests.get("http://www.espncricinfo.com/rankings/content/page/211270.html")
soup = BeautifulSoup(res.text,"lxml")
 ## if any different table you expect to have then just change the index number 
 ## and the appropriate name in the selector
item = soup.select("iframe[name='testbat']")[0]['src']
req = requests.get(item)
sauce = BeautifulSoup(req.text,"lxml")
for items in sauce.select("table tr"):
    data = [item.text for item in items.select("th,td")]
    print(data)

Partial results:

['Rank', 'Name', 'Country', 'Rating']
['1', 'S.P.D. Smith', 'AUS', '947']
['2', 'V. Kohli', 'IND', '912']
['3', 'J.E. Root', 'ENG', '881']

0人赞添加讨论(0) 举报

How can I parse table data from website using Sele

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间