Obtaining twitter screen names from a twitter list

2019-01-20 14:16发布

I am keen to get a list of usernames and fullnames names from a specific twitter list using R. I could not see a function in any package but this code works

library(XML)
library(httr)


url.name <- "https://twitter.com/TwitterUK/lists/premier-league-players/members"
url.get=GET(url.name)
url.content=content(url.get, as="text")
pagehtml <- htmlParse(url.content)

screenNames <-xpathSApply(pagehtml, '//*/span[@class="username js-action-profile-name"]',xmlValue)
realName <- xpathSApply(pagehtml, '//*/strong[@class="fullname js-action-profile-name"]',xmlValue)

However, it only provides the first 20 values (? what appears on screen) whilst the list is much longer

If there is an rvest solution, this would also be welcome

cheers

标签: r twitter
2条回答
ゆ 、 Hurt°
2楼-- · 2019-01-20 15:06

The solution from Molx does not seem to work any more. The problem seems to lie in

api.url <- paste0("https://api.twitter.com/1.1/lists/members.json?slug=",
           twlist, "&owner_screen_name=", twowner, "&count=5000")

This URL does not seem valid, for any twlist or twowner that I tried. EDIT : the problem comes from the authentication I think as I get

{"errors":[{"code":215,"message":"Bad Authentication data."}]}

I think I'm authenticated with this

## Twitter authentication, 
consumer_key = "xxxxx"
consumer_secret = "xxx"
access_token = "xxxxx"
access_secret = "xxx"
setup_twitter_oauth(consumer_key, consumer_secret, access_token,
access_secret)

Where does the problem come from ?

EDIT : When I enter get_oauth_sig() I get the result below

> twitteR:::get_oauth_sig()
<Token>
NULL
<oauth_app> twitter
  key:    XXXXXXX
  secret: <hidden>
<credentials> oauth_token, oauth_token_secret
---

Is this normal ?

The solution from Molx does not seem to work any more. The problem seems to lie in

api.url <- paste0("https://api.twitter.com/1.1/lists/members.json?slug=",
           twlist, "&owner_screen_name=", twowner, "&count=5000")

This URL does not seem valid, for any twlist or twowner that I tried. EDIT : the problem comes from the authentication I think as I get

{"errors":[{"code":215,"message":"Bad Authentication data."}]}

I think I'm authenticated with this

## Twitter authentication, 
consumer_key = "xxxxx"
consumer_secret = "xxx"
access_token = "xxxxx"
access_secret = "xxx"
setup_twitter_oauth(consumer_key, consumer_secret, access_token,
access_secret)

Where does the problem come from ?

EDIT : When I enter get_oauth_sig() I get the result below

> twitteR:::get_oauth_sig()
<Token>
NULL
<oauth_app> twitter
  key:    XXXXXXX
  secret: <hidden>
<credentials> oauth_token, oauth_token_secret
---

Is this normal ?

EDIT : I solve the problem by replacing POST by GET

library(rjson)
library(twitteR)
consumer_key = "xxxxx"
consumer_secret = "xxx"
access_token = "xxxxx"
access_secret = "xxx"
setup_twitter_oauth(consumer_key, consumer_secret, access_token,
access_secret)
https://twitter.com/ivalerio/lists/justice?lang=fr
twlist <- "d-put-s-2017-2022"
twowner <- "ivalerio"
api.url <- paste0("https://api.twitter.com/1.1/lists/members.json?slug=",
           twlist, "&owner_screen_name=", twowner, "&count=5000")
response <- GET(api.url, config(token=twitteR:::get_oauth_sig()))
#Count = 5000 is the number of names per result page,
#        which for this case simplifies things to one page.
# This returns a JSON response which we can read using fromJSON:
response.list <- fromJSON(content(response, as = "text", encoding = "UTF-8"))
# Now, we have a list where each element is the Twitter data of one Twitter-list member. To extract their names and user_names:
users.names <- sapply(response.list$users, function(i) i$name)
users.screennames <- sapply(response.list$users, function(i) i$screen_name)
# Which are:
head(users.names)
查看更多
放荡不羁爱自由
3楼-- · 2019-01-20 15:10

If you want to work with R and twitter, you should take a look at the twitteR package. It doesn't have a function to retrieve the information you want, but we can take advantage of its internal functions to use OAuth, and then send the correct API call. The advantage of using API calls is that you don't rely on parsing the HTML page, you're actually doing what developers are supposed to do.

The code below assumes you have already authenticated using setup_twitter_oauth(), you can find tutorials on this easily, since it's the package basics. Once authenticated, let's load the packages we need:

library(rjson)
library(httr)
# library(twitteR) Should have been loaded already of course

Now to do the API call, we'll use POST. The URL has a slug parameter which is the twitter list name, and a owner_screen_name parameter which is the Twitter Account owner of the list. We'll use internal twitteR:::get_oauth_sig() to authenticate the call.

twlist <- "premier-league-players"
twowner <- "TwitterUK"
api.url <- paste0("https://api.twitter.com/1.1/lists/members.json?slug=",
           twlist, "&owner_screen_name=", twowner, "&count=5000")
response <- POST(api.url, config(token=twitteR:::get_oauth_sig()))
#Count = 5000 is the number of names per result page,
#        which for this case simplifies things to one page.

This returns a JSON response which we can read using fromJSON:

response.list <- fromJSON(content(response, as = "text", encoding = "UTF-8"))

Now, we have a list where each element is the Twitter data of one Twitter-list member. To extract their names and user_names:

users.names <- sapply(response.list$users, function(i) i$name)
users.screennames <- sapply(response.list$users, function(i) i$screen_name)

Which are:

> head(users.names)
[1] "Peter Crouch"         "barry bannan"         "Jose Leonardo Ulloa "
    "Paul McShane"         "nacho monreal"        "James Ward-Prowse"
> head(users.screennames)
[1] "petercrouch"   "bazzabannan25" "Ciclone1923"   "pmacca15"
    "_nachomonreal" "Prowsey16"

Now the best part of this code is that it opens up pretty much the entire twitter API from R, as an already authenticated request. You can check the response list and sublists for all the available information on each query.

查看更多
登录 后发表回答