When using the python tweepy
library to pull tweets from twitter's streaming API is it possible to exclude retweets?
For instance, if I want only the tweets posted by a particular user ex: twitterStream.filter(follow = ["20264932"])
but this returns retweets and I would like to exclude them. How can I do this?
Thank you in advance.
Just checking a tweet's text to see if it starts with 'RT' is not really a robust solution. You need to make a decision about what you will consider a retweet, since it isn't exactly clear-cut. The Twitter API docs explain that tweets with 'RT' in the tweet text aren't officially retweets.
If you're going by the 'official' definition, then you want to filter tweets out if they have a
True
value for their retweeted attribute, like this:And if you want to be more inclusive, including 'unofficial' re-tweets, you should check the string for the substring 'RT @' and not merely if it starts with 'RT' because that the former is cleaner, faster and eliminates more edge cases where a tweet starts with 'RT' but isn't a retweet (lots of data out there, I'm sure this is a possibility). Here's some code for that:
The latter conditional takes the subset of tweets in your collection that are regular tweets and does an intersection with the subset of tweets in your collection that do not have 'RT @' in the tweet text, leaving you with tweets that are supposedly regular tweets.
Yes there are possible ways of doing this, One of them is to check if the text of the tweet, starts with
RT
, For this we can easily use.startswith()
method on strings and for this you need to change the code of theon_data()
method in your streaming class, which can be done as: