Exclude scraping retweets with twitteR in r

2019-01-28 19:26发布

I'm currently scraping tweets based on certain keywords using r v. 1.0.44 and the package twitteR (newest version). Specifically I use the following command:

 my_twitter_data  <- searchTwitter("#aleppo", n = 40000, lang = "en", since =    '2016-12-12', until = "2016-12-13", retryOnRateLimit = 120)

In a request for 40k tweets about #aleppo (which takes quite some time to get due to rate limitation) only 5k of the results will be original tweets, i.e. strip_retweets(my_twitter_data, strip_manual=TRUE, strip_mt=TRUE) will return a list of length 5k.

My problem is that I spend a lot of my rate limit and therefore time on retweets which are irrelevant for my further analysis. My question is if there is a way around this problem in R so I only spend my rate limit on original tweets?

标签： r twitter web-scraping

2条回答

何必那么认真

2楼-- · 2019-01-28 20:09

You can add -filter:retweets to your query:

 my_twitter_data <- searchTwitter("#aleppo -filter:retweets", n = 40000,
                                  lang = "en", since = '2016-12-12',
                                  until = "2016-12-13", retryOnRateLimit = 120)

0人赞添加讨论(0) 举报

叼着烟拽天下

3楼-- · 2019-01-28 20:28

my_twitter_data <- searchTwitter("#aleppo exclude:retweets", n = 40000, lang = "en", since = '2016-12-12', until = "2016-12-13", retryOnRateLimit = 120)

0人赞添加讨论(0) 举报

Exclude scraping retweets with twitteR in r

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间