Scraping table of NBA stats with rvest

I'd like to scrape a table of NBA team stats with rvest, I've tried using:

the table element

library(rvest)

url_nba <- "http://stats.nba.com/teams/advanced/#!?sort=TEAM_NAME&dir=-1"

team_stats <- url_nba %>% read_html %>% html_nodes('table') %>% html_table

the xpath (via google chrome inspect)

team_stats <- url_nba %>% 
      read_html %>%
      html_nodes(xpath="/html/body/main/div[2]/div/div[2]/div/div/nba-stat-table/div[1]/div[1]/table") %>%
      html_table

the css selector (via mozilla inspect):

team_stats <- url_nba %>% 
      read_html %>%
      html_nodes(".nba-stat-table__overflow > table:nth-child(1)") %>%
      html_table

but with no luck. Any help would be greatly appreciated.

标签： css r web-scraping rvest

1条回答

不美不萌又怎样

2楼-- · 2019-04-16 12:10

This question is very similar to this one: How to select a particular section of JSON Data in R?

The data you are requesting is not stored in the html code, thus the failures using rvest. The requested data is stored as a XHR file which and can be accessed directly:

library(httr)
library(jsonlite)

nba<-GET('http://stats.nba.com/stats/leaguedashteamstats?Conference=&DateFrom=&DateTo=&Division=&GameScope=&GameSegment=&LastNGames=0&LeagueID=00&Location=&MeasureType=Advanced&Month=0&OpponentTeamID=0&Outcome=&PORound=0&PaceAdjust=N&PerMode=PerGame&Period=0&PlayerExperience=&PlayerPosition=&PlusMinus=N&Rank=N&Season=2016-17&SeasonSegment=&SeasonType=Regular+Season&ShotClockRange=&StarterBench=&TeamID=0&VsConference=&VsDivision=' )

Once the data is loaded into a the nba variable, using httr and jsonlite to clean-up the data:

#access the data
out<- content(nba, as="text") %>% fromJSON(flatten=FALSE) 

#convert into dataframe.  
#  str(out) to determine the structure
df<-data.frame(out$resultSets$rowSet)
names(df)<-out$resultSets$headers[[1]]

I highly recommend reading the answer to the question which I linked above.

0人赞添加讨论(0) 举报

Scraping table of NBA stats with rvest

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间