Tweepy Streaming - Stop collecting tweets at x amo

2019-01-24 16:59发布

问题:

I'm looking to have the Tweepy Streaming API stop pulling in tweets after I have stored x # of tweets in MongoDB.

I have tried IF and WHILE statements inside the class, defintion with counters, but cannot get it to stop at a certain X amount. This is a real head-banger for me. I found this link here: https://groups.google.com/forum/#!topic/tweepy/5IGlu2Qiug4 but my efforts to replicate this have failed. It always tells me that init needs an additional argument. I believe we have our Tweepy auth set different, so it is not apples to apples.

Any thoughts?

from tweepy.streaming import StreamListener
from tweepy import OAuthHandler
from tweepy import Stream
import json, time, sys

import tweepy
auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(OAUTH_TOKEN, OAUTH_TOKEN_SECRET)

class StdOutListener(StreamListener):

    def on_status(self, status):
        text = status.text
        created = status.created_at
        record = {'Text': text, 'Created At': created}
        print record #See Tweepy documentation to learn how to access other fields
        collection.insert(record)  


    def on_error(self, status):
        print 'Error on status', status

    def on_limit(self, status):
        print 'Limit threshold exceeded', status

    def on_timeout(self, status):
        print 'Stream disconnected; continuing...'


stream = Stream(auth, StdOutListener())
stream.filter(track=['tv'])

回答1:

You need to add a counter inside of your class in __init__, and then increment it inside of on_status. Then when the counter is below 20 it will insert a record into the collection. This could be done as show below:

def __init__(self, api=None):
    super(StdOutListener, self).__init__()
    self.num_tweets = 0

def on_status(self, status):
    record = {'Text': status.text, 'Created At': status.created_at}
    print record #See Tweepy documentation to learn how to access other fields
    self.num_tweets += 1
    if self.num_tweets < 20:
        collection.insert(record)
        return True
    else:
        return False