Exclude retweets from twitter streaming api using

2019-02-07 08:54发布

When using the python tweepy library to pull tweets from twitter's streaming API is it possible to exclude retweets?

For instance, if I want only the tweets posted by a particular user ex: twitterStream.filter(follow = ["20264932"]) but this returns retweets and I would like to exclude them. How can I do this?

Thank you in advance.

2条回答
爷的心禁止访问
2楼-- · 2019-02-07 09:19

Just checking a tweet's text to see if it starts with 'RT' is not really a robust solution. You need to make a decision about what you will consider a retweet, since it isn't exactly clear-cut. The Twitter API docs explain that tweets with 'RT' in the tweet text aren't officially retweets.

Sometimes people type RT at the beginning of a Tweet to indicate that they are re-posting someone else's content. This isn't an official Twitter command or feature, but signifies that they are quoting another user's Tweet.

If you're going by the 'official' definition, then you want to filter tweets out if they have a True value for their retweeted attribute, like this:

if not tweet['retweeted']:
    # do something with standard tweets

And if you want to be more inclusive, including 'unofficial' re-tweets, you should check the string for the substring 'RT @' and not merely if it starts with 'RT' because that the former is cleaner, faster and eliminates more edge cases where a tweet starts with 'RT' but isn't a retweet (lots of data out there, I'm sure this is a possibility). Here's some code for that:

if not tweet['retweeted'] and 'RT @' not in tweet['text']:
    # do something with standard tweets

The latter conditional takes the subset of tweets in your collection that are regular tweets and does an intersection with the subset of tweets in your collection that do not have 'RT @' in the tweet text, leaving you with tweets that are supposedly regular tweets.

查看更多
成全新的幸福
3楼-- · 2019-02-07 09:29

Yes there are possible ways of doing this, One of them is to check if the text of the tweet, starts with RT, For this we can easily use .startswith() method on strings and for this you need to change the code of the on_data() method in your streaming class, which can be done as:

class TwitterStreamListener(tweepy.StreamListener):
    def on_data(self, data):
        # Twitter returns data in JSON format - we need to decode it first
        decoded = json.loads(data)
        if  not decoded[`text`].startswith('RT'):
            #Do processing here 
            print decoded['text'].encode('ascii', 'ignore')
        return True
查看更多
登录 后发表回答