Tweepy rate limit / pagination issue.

2019-07-19 02:54发布

I've put together a small twitter tool to pull relevant tweets, for later analysis in a latent semantic analysis. Ironically, that bit (the more complicated bit) works fine - it's pulling the tweets that's the problem. I'm using the code below to set it up.

This technically works, but no as expected - the .items(200) parameter I thought would pull 200 tweets per request, but it's being blocked into 15 tweet chunks (so the 200 items 'costs' me 13 requests) - I understand that this is the original/default RPP variable (now 'count' in the Twitter docs), but I've tried that in the Cursor setting (rpp=100, which is the maximum from the twitter documentation), and it makes no difference.

Tweepy/Cursor docs
The other nearest similar question isn't quite the same issue

Grateful for any thoughts! I'm sure it's a minor tweak to the settings, but I've tried various settings on page and rpp, to no avail.

auth = tweepy.OAuthHandler(apikey, apisecret)
auth.set_access_token(access_token, access_token_secret_var)
from tools import read_user, read_tweet
from auth import basic
api = tweepy.API(auth)
current_results = []
from tweepy import Cursor
for tweet in Cursor(api.search,
                       q=search_string,
                       result_type="recent",
                       include_entities=True,
                       lang="en").items(200):
    current_user, created = read_user(tweet.author)
    current_tweet, created = read_tweet(tweet, current_user)
    current_results.append(tweet)
print current_results

1条回答
不美不萌又怎样
2楼-- · 2019-07-19 03:22

I worked it out in the end, with a little assistance from colleagues. Afaict, the rpp and items() calls are coming after the actual API call. The 'count' option from the Twitter documentation which was formerly RPP as mentioned above, and is still noted as rpp in Tweepy 2.3.0, seems to be at issue here.

What I ended up doing was modifying the Tweepy Code - in api.py, I added 'count' in to the search bind section (around L643 in my install, ymmv).

""" search """
search = bind_api(
    path = '/search/tweets.json',
    payload_type = 'search_results',
    allowed_param = ['q', 'count', 'lang', 'locale', 'since_id', 'geocode', 'max_id', 'since', 'until', 'result_type', **'count**', 'include_entities', 'from', 'to', 'source']
)

This allowed me to tweak the code above to:

for tweet in Cursor(api.search,
                       q=search_string,
                       count=100,
                       result_type="recent",
                       include_entities=True,
                       lang="en").items(200):

Which results in two calls, not fifteen; I've double checked this with

print api.rate_limit_status()["resources"]

after each call, and it's only deprecating my remaining searches by 2 each time.

查看更多
登录 后发表回答