I'm currently trying to retrieve the followers of some big account with a lot of followers.
I'm using Tweepy and this piece of code (with cursor):
follower_cursors = tweepy.Cursor(api.followers, id = id_var,count=5000)
for friend in follower_cursors.items():
Ok if I don't specify count it seems that by default only 20 results are shown per page, but as from Twitter API documentation it can provide 5000 followers I tried to set it to the maximum.
However this doesn't seem to be taken into account and each page contains a maximum of 200 entries, which is a real problem as you will trigger the rate limit much more easily.
What m'I doing wrong? Is there a way to make Tweepy requests pages of 5000 IDs, to minimize requets and overide this default max value of 200?
Thanks!
You could use cursor for pages
instead of items
, and then process the items per page:
for page in Cursor(api.user_timeline).pages():
# page is a list of statuses
process_page(page)
# or iterate over items in `page`
I don't see a limit in the tweepy Cursor for results returned, so it should return as many as it gets.
Previous answer:
The max per-page result is enforced by the Twitter API, not by tweepy. You're supposed to paginate over the list of 200-per-call results, which Cursor
is already doing for you. If there were 5000 followers, then with the max 200 results per query, you're using only 25 calls. You'd still have 4975 calls left to do other things.
To exceed the 5000-per-hour rate limit, you'd need to be doing at least 83 calls per minute or 1.4 calls per second.
Note that 'read limits' are per-application but 'write limits' are per user. So you could split your task between two or more apps* if they are read intensive.
Consider using the Streaming API instead, if it's more appropriate for your needs.
*: Though I'm sure Twitter has controls in place to prevent abuse.