What's my user agent when I parse website with

2019-02-07 09:56发布

问题:

Since it is easy in R, I am using rvest package to parse HTML to extract informations from website.

I am wondering what's my User-Agent (if there is any) during the request, since User-Agent is assigned to the internet browser or is there a way to set it somehow?

My code that open session and extract informations from HTML is below:

library(rvest)
se <- html_session( "http://www.wp.pl" ) %>% 
html_nodes("[data-st-area=Glonews-mozaika] li:nth-child(7) a") %>%
html_attr( name = "href" )

回答1:

I used https://httpbin.org/user-agent to find out:

library(rvest)
se <- html_session( "https://httpbin.org/user-agent" )
se$response$request$options$useragent

Answer:

[1] "libcurl/7.37.1 r-curl/0.9.1 httr/1.0.0"

See this bug report for a way to override it.



回答2:

I found this somewhere in a tutorial, it looks like an easier faster way to do it:

uastring <- "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36"
session <- html_session("https://www.linkedin.com/job/", user_agent(uastring))