Since it is easy in R, I am using rvest package to parse HTML to extract informations from website.
I am wondering what's my User-Agent (if there is any) during the request, since User-Agent is assigned to the internet browser or is there a way to set it somehow?
My code that open session and extract informations from HTML is below:
library(rvest)
se <- html_session( "http://www.wp.pl" ) %>%
html_nodes("[data-st-area=Glonews-mozaika] li:nth-child(7) a") %>%
html_attr( name = "href" )
I used https://httpbin.org/user-agent to find out:
library(rvest)
se <- html_session( "https://httpbin.org/user-agent" )
se$response$request$options$useragent
Answer:
[1] "libcurl/7.37.1 r-curl/0.9.1 httr/1.0.0"
See this bug report for a way to override it.
I found this somewhere in a tutorial, it looks like an easier faster way to do it:
uastring <- "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36"
session <- html_session("https://www.linkedin.com/job/", user_agent(uastring))