I'm trying to find out the number of replies for all given tweets by a user. This is not something avaiable directly from Twitter's API. I've decided to only go after replies from a user's followers, to help distill down the data generated and as a good approximation (I believe msot of the replies to a tweet will come directly from that users followers.
I believe I've come a long way already, I jsut need help with the final section. I'm struggling to make the function I've created run over all the followers.
I'd rather this solution be in R over Python, although I know this exists and will be an option. I've also put in the twitter tag for Donald Trump; I'm not trying to do it for him and know that his huge following will make this a challenge. I want a generic version useable for whichever user is inputted.
##set name of tweeter to look at (this can be changed)
targettwittername <- "realDonaldTrump"
##get this tweeter's timeline
tmls <- get_timeline(targettwittername, n=3200, retryonratelimit=TRUE)
##get their user id
targettwitteruserid <- as.numeric(select(lookup_users(targettwittername), user_id))
##get ids of their tweets
tweetids <- select(tmls, status_id)
tweetids <- transform(tweetids, status_id_num=as.numeric(status_id))
##get list of followers (who are most likely to reply)
targetfollowers <- data.frame(get_followers(targettwittername))
##clean up follower list to exclude those that have never tweeted and restricted access
user_lookup <- lookup_users(targetfollowers)
users_with_tweets_and_unprotected <- filter(user_lookup, statuses_count != 0)
users_with_tweets_and_unprotected <- select(filter(users_with_tweets_and_unprotected, protected != "TRUE"), user_id)
targetfollowers <- filter(targetfollowers, user_id %in% users_with_tweets_and_unprotected$user_id)
##custom function to search all followers timelines one by one
getfollowersreplies <- function(x){
follower <- as.numeric(x[1])
followertl <- data.frame(get_timeline(follower, n=3200, retryonratelimit=TRUE))
followertl <- filter(followertl, in_reply_to_status_user_id == targettwitteruserid)
followertl <- transform(followertl, reply_to_status_id_num=as.numeric(in_reply_to_status_status_id))
join <- inner_join(followertl, tweetids, by=c("reply_to_status_id_num"="status_id_num"))
replycounts <- data.frame(
join %>%
group_by(user_id, reply_to_status_id_num) %>%
tweet_replies <- do.call("rbind", lapply(targetfollowers$user_id, getfollowersreplies))
The biggest obstacle would be the time it takes to collect up to 3,200 of the most recent tweets posted by more than 42 million followers of @realDonaldTrump.
Twitter limits the number of follower user IDs collected to 75,000 every 15 minutes.
Assuming you have a reliable internet connection and time, then you can use the following code to get all 42 million follower IDs.
Then you'd probably want to construct a for loop that uses
and handles API rate limits. In the example code below, I've made the loop sleep until the rate limit reset after every 56 calls.As you can see, this would take a really long time. You'd be better off trying to collect all the replies in the past 6-9 days. The code below gets up to 5 million replies to Trump's tweets from the past 9 days. Warning: if there are actually that many replies (I honestly have no idea) available from the past 9 days, this search would take just under three days to finish.