R: LinkedIn scraping using rvest

2019-06-08 01:05发布

问题:

Using rvest package, I am trying to scrape data from my LinkedIn profile.

These attempts:

library(rvest)
url = "https://www.linkedin.com/profile/view?id=AAIAAAFqgUsBB2262LNIUKpTcr0cF_ekoX9ZJh0&trk=nav_responsive_tab_profile"
li = read_html(url)
html_nodes(li, "#experience-316254584-view span.field-text")
html_nodes(li, xpath='//*[@id="experience-610617015-view"]/p/span/text()')

don't find any nodes:

#> {xml_nodeset (0)}

Q: How to return just the text?

#> "Quantitative hedge fund manager selection for $650m portfolio of alternative investments"

EDIT:

LinkedIn has an API, however for some reason, below returns only the first two positions of experience, no other items (like education, projects). Hence the scraping approach.

library("Rlinkedin")
auth = inOAuth(application_name, consumer_key, consumer_secret)
getProfile(auth, connections = FALSE, id = NULL) # returns very limited data

回答1:

You are making things unnecessarily difficult... All you need to do is issue a GET request to https://api.linkedin.com/v1/people/~?format=json after obtaining an OAuth 2.0 token from Linkedin. In R, you can do this using jsonlite:

library(jsonlite)
linkedin <- fromJSON('https://api.linkedin.com/v1/people/~?format=json')
position <- linkedin$headline

You must have the 'r_basicprofile' member permission on your oauth token.