How to extract the player's information from t

I am trying to scrape some information for a website using selenium below is the link to the website http://www.ultimatetennisstatistics.com/playerProfile?playerId=4742 the information i am trying to get is under the player 'statistics' my code right now opens the player's profile and then opens the player's statistics page i am trying to find a way to extract the information in the player's statistics page below is my code so far

from bs4 import BeautifulSoup
from selenium import webdriver

driver = webdriver.Chrome()
driver.get("http://www.ultimatetennisstatistics.com/playerProfile?playerId=4742")
soup = BeautifulSoup(driver.page_source,"lxml")
try:
dropdown = driver.find_element_by_xpath('//*[@id="playerPills"]/li[9]/a')
dropdown.click()


bm = driver.find_element_by_id('statisticsPill')
bm.click()

for i in soup.select('#statistics table.table tr'):
    print(i)
    data1 = [x.get_text(strip=True) for x in i.select("th,td")]
    print(data1)

except ValueError:
      print("error")

I Serve

                            <th class="pct-data text-right"><i class="fa fa-percent"></i></th>
                            <th class="raw-data text-right" style="display: none;"><i class="fa fa-hashtag"></i></th>
                        </tr>
                        </thead>
                        <tbody>
                        <tr>
                            <td>Ace %</td>



                            <th class="text-right pct-data">23.4%</th>
                            <th class="raw-data text-right" style="display: none;">12942 / 55377</th>


                        </tr>
                        <tr>
                            <td>Double Fault %</td>



                            <th class="text-right pct-data">4.2%</th>
                            <th class="raw-data text-right" style="display:

标签： python selenium selenium-webdriver web-scraping webdriverwait

2条回答

Ridiculous、

2楼-- · 2019-02-26 20:18

To extract the information of the player's from the Statistics page you can use the following solution:

Code Block:

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

options = webdriver.ChromeOptions() 
options.add_argument("start-maximized")
options.add_argument('disable-infobars')
driver=webdriver.Chrome(chrome_options=options, executable_path=r'C:\Utility\BrowserDrivers\chromedriver.exe')
driver.get("http://www.ultimatetennisstatistics.com/playerProfile?playerId=4742")
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//ul[@id='playerPills']//a[@class='dropdown-toggle'][normalize-space()='Statistics']"))).click()
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//ul[@class='dropdown-menu']//a[@id='statisticsPill'][normalize-space()='Statistics']"))).click()
statistics_items = WebDriverWait(driver, 10).until(EC.visibility_of_any_elements_located((By.XPATH, "//table[@class='table table-condensed table-hover table-striped']//tbody//tr/td")))
statistics_value = WebDriverWait(driver, 10).until(EC.visibility_of_any_elements_located((By.XPATH, "//table[@class='table table-condensed table-hover table-striped']//tbody//tr//following::th[1]")))
for item, value in zip(statistics_items, statistics_value):
    print('{} {}'.format(item.text, value.text))

Console Output:

Ace % 4.0%
Double Fault % 2.1%
1st Serve % 68.7%
1st Serve Won % 71.8%
2nd Serve Won % 57.3%
Break Points Saved % 66.3%
Service Points Won % 67.2%
Service Games Won % 85.6%
Ace Against % Return
Double Fault Against % 7.2%
1st Srv. Return Won % 3.4%
2nd Srv. Return Won % 34.2%
Break Points Won % 55.3%
Return Points Won % 44.9%
Return Games Won % 42.4%
Points Dominance 33.3%
Games Dominance Total
Break Points Ratio 1.29
Total Points Won % 2.31
Games Won % 1.33
Sets Won % 54.4%
Matches Won % 59.7%
Match Time 77.2%

0人赞添加讨论(0) 举报

姐就是有狂的资本

3楼-- · 2019-02-26 20:20

The problem is with the location of this line -

soup = BeautifulSoup(driver.page_source,"lxml")

It should come AFTER you have clicked on the "Statistics" tab. Because then only the table loads and soup can parse it.

Final code -

from selenium import webdriver
from bs4 import BeautifulSoup
import time

driver = webdriver.Chrome(executable_path=r'//path/chromedriver.exe')
driver.get("http://www.ultimatetennisstatistics.com/playerProfile?playerId=4742")

try:
   dropdown = driver.find_element_by_xpath('//*[@id="playerPills"]/li[9]/a')
   dropdown.click()
   bm = driver.find_element_by_id('statisticsPill')
   bm.click()
   driver.maximize_window()
   soup = BeautifulSoup(driver.page_source,"lxml")
   for i in soup.select('#statisticsOverview table tr'):
     print(i.text)
     data1 = [x.get_text(strip=True) for x in i.select("th,td")]
     print(data1)

except ValueError:
      print("error")

0人赞添加讨论(0) 举报

How to extract the player's information from t

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间