Beautiful Soup fetch dynamic table data

2019-07-19 14:24发布

I have the following code:

url = 'https://www.basketball-reference.com/leagues/NBA_2017_standings.html#all_expanded_standings'
html = urlopen(url)
soup = BeautifulSoup(html, 'lxml')

print(len(soup.findAll('table')))
print(soup.findAll('table'))

There are 6 tables on the webpage, but it only returns 4 tables. I tried to use 'html.parser' or 'html5lib' as parsers but did not work either.

Any idea how I can get the Table "expanded standings" from the webpage?

Thanks!

标签： python parsing web-scraping beautifulsoup lxml

1条回答

放荡不羁爱自由

2楼-- · 2019-07-19 15:02

requests can't fetch data that are loaded by JS. So, you have to use selenium. First install selenium via pip - pip install selenium and download chrome driver and put the file in your working directory. Then try the following code.

from bs4 import BeautifulSoup
import time
from selenium import webdriver

url = "https://www.basketball-reference.com/leagues/NBA_2017_standings.html"
browser = webdriver.Chrome()

browser.get(url)
time.sleep(3)
html = browser.page_source
soup = BeautifulSoup(html, "lxml")

print(len(soup.find_all("table")))
print(soup.find("table", {"id": "expanded_standings"}))

browser.close()
browser.quit()

See selenium documentation.

If you are on Linux and get error Chromedriver executable needs to be in the PATH then try following these ways - link-1, link-2

0人赞添加讨论(0) 举报

Beautiful Soup fetch dynamic table data

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间