Hi I am trying to extract the table from the premierleague
website.
The package I am using is rvest
package and the code I am using in the inital phase is as follows:
library(rvest)
library(magrittr)
premierleague <- read_html("https://fantasy.premierleague.com/a/entry/767830/history")
premierleague %>% html_nodes("ism-table")
I couldn't find a html tag that would work to extract the html_nodes
for rvest package.
I was using similar approach to extract data from "http://admissions.calpoly.edu/prospective/profile.html" and I was able to extract the data. The code I used for calpoly is as follows:
library(rvest)
library(magrittr)
CPadmissions <- read_html("http://admissions.calpoly.edu/prospective/profile.html")
CPadmissions %>% html_nodes("table") %>%
.[[1]] %>%
html_table()
Got the code above from youtube through this link: https://www.youtube.com/watch?v=gSbuwYdNYLM&ab_channel=EvanO%27Brien
Any help on getting data from fantasy.premierleague.com is highly appreciated. Do I need to use some kind of API ?
Since the data is loaded with JavaScript, grabbing the HTML with rvest will not get you what you want, but if you use PhantomJS as a headless browser within RSelenium, it's not all that complicated (by RSelenium standards):
As always, more cleaning is necessary, but overall, it's in pretty good shape without too much work. (If you're using the tidyverse,
df %>% mutate_if(is.character, parse_number)
will do pretty well.) The arrows are images which is why the last column is allNA
, but you can calculate those anyway.This solution uses RSelenium along with the package
XML
. It also assumes that you have a working installation ofRSelenium
that can properly work withfirefox
. Just make sure you have thefirefox
starter script path added to yourPATH
.If you are using
OS X
, you will need to add/Applications/Firefox.app/Contents/MacOS/
to yourPATH
. Or, if you're on an Ubuntu machine, it's likely/usr/lib/firefox/
. Once you're sure this is working, you can move on to R with the following:This should yield:
Please note that the column CPW (change from previous week) is a vector of empty strings.
I hope this helps.