I'm interested in extracting the player tables on basketball-reference.com. I have successfully extracted the per game statistics table for a specific player (i.e. LeBron James, as an example), which is the first table listed on the web page. However, there are 10+ tables on the page that I can't seem to extract. I've been able to get the table into R a couple different ways. First, using the rvest package:
library(rvest)
lebron <- "https://www.basketball-reference.com/players/j/jamesle01.html"
lebron_webpage <- read_html(lebron)
lebron_table <- html_table(lebron_webpage, fill = TRUE)
lebron_pergame <- data.frame(lebron_table)
Now I have LeBron's per game statistics from his career in a nice data frame. I'm also able to read the same table in using a combination of the XML and RCurl package.
library(RCurl)
library(XML)
lebron_url <- paste0(lebron)
lebron_url <- getURL(lebron_url)
lebron_table <- readHTMLTable(lebron_url, which = 1)
The problem comes if I want to read in an other table on the page. For example, the next table on the page is Totals. I've tried using a CSS selector to select the specific table I want to read in, but I can't get that to work. I've also tried to right click, inspect element on the page and copy the XPath for the table, but I also can't get that to work. I've spent a lot of time researching this issue on Google, but can't seem to find anything that solves this problem. Any help would be greatly appreciated! Thanks in advance!