Tweepy rate limit / pagination issue.

I've put together a small twitter tool to pull relevant tweets, for later analysis in a latent semantic analysis. Ironically, that bit (the more complicated bit) works fine - it's pulling the tweets that's the problem. I'm using the code below to set it up.

This technically works, but no as expected - the .items(200) parameter I thought would pull 200 tweets per request, but it's being blocked into 15 tweet chunks (so the 200 items 'costs' me 13 requests) - I understand that this is the original/default RPP variable (now 'count' in the Twitter docs), but I've tried that in the Cursor setting (rpp=100, which is the maximum from the twitter documentation), and it makes no difference.

Tweepy/Cursor docs
The other nearest similar question isn't quite the same issue

Grateful for any thoughts! I'm sure it's a minor tweak to the settings, but I've tried various settings on page and rpp, to no avail.

auth = tweepy.OAuthHandler(apikey, apisecret)
auth.set_access_token(access_token, access_token_secret_var)
from tools import read_user, read_tweet
from auth import basic
api = tweepy.API(auth)
current_results = []
from tweepy import Cursor
for tweet in Cursor(api.search,
                       q=search_string,
                       result_type="recent",
                       include_entities=True,
                       lang="en").items(200):
    current_user, created = read_user(tweet.author)
    current_tweet, created = read_tweet(tweet, current_user)
    current_results.append(tweet)
print current_results

标签： django python-2.7 twitter tweepy django-1.6

1条回答

不美不萌又怎样

2楼-- · 2019-07-19 03:22

I worked it out in the end, with a little assistance from colleagues. Afaict, the rpp and items() calls are coming after the actual API call. The 'count' option from the Twitter documentation which was formerly RPP as mentioned above, and is still noted as rpp in Tweepy 2.3.0, seems to be at issue here.

What I ended up doing was modifying the Tweepy Code - in api.py, I added 'count' in to the search bind section (around L643 in my install, ymmv).

""" search """
search = bind_api(
    path = '/search/tweets.json',
    payload_type = 'search_results',
    allowed_param = ['q', 'count', 'lang', 'locale', 'since_id', 'geocode', 'max_id', 'since', 'until', 'result_type', **'count**', 'include_entities', 'from', 'to', 'source']
)

This allowed me to tweak the code above to:

for tweet in Cursor(api.search,
                       q=search_string,
                       count=100,
                       result_type="recent",
                       include_entities=True,
                       lang="en").items(200):

Which results in two calls, not fifteen; I've double checked this with

print api.rate_limit_status()["resources"]

after each call, and it's only deprecating my remaining searches by 2 each time.

0人赞添加讨论(0) 举报

Tweepy rate limit / pagination issue.

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间