For a research project I would like to get the last 3 months worth of Twitter messages. Technical challenges aside, is this possible? by using some sort of slow polling mechanism to keep the rate limiter at bay?
The Twitter API states "Clients may request up to 3,200 statuses via the page and count parameters for timeline REST API" Are these per hour? Per day? or...ever?
Any suggestions? Would it even be theoretically possible? Did some one do something similar before?
Thanks! Marco
Keyhole can get you historical tweets in xls or present them in a visual dashboard. The preview samples only a few most recent tweets, however, you can request historical data if you email them.
See: http://keyhole.co/conversation_tracking
You could use the Search API, don't give it a search, return the maximum of 100 per page, then got through each page twice a minute(120 times an hour - 30 times less than the rate limit). However, if my math is correct, that could possibly give you 720,000 tweets an hour..... the problem is that Twitter has added approximately 1.75 billion tweets over the past 3 months. So if my math is correct, it would take you 2361 days, or 6 years to complete this.
You could ask this question over on the Twitter Development talk on Google Groups, or contact Twitter to get white-listed so you could make up to 20,000 requests an hour.
Personally, I don't think it's possible.
You can read the twitter historic data using Gnip's Historic PowerTrack tool. It will give you access to all twitter data since first tweet and fairly it is very simple tool t use.
Twitter notoriously does not make "available" tweets older than three weeks. In some cases you can only get one week. You're better off storing tweets for the next three months. Many rightly doubt if they're even persisted by Twitter.
Are you looking for just any tweets? If so, check out the Streaming API's status/sample method. The streaming API uses persistent HTTP sockets that can be a pain to program, but it's quite graceful when you get it working. I'd recommend setting up a little script to dump tweets from status/sample into a DB. You should have a TON of data after just a few days.
DataSift claims to have a twitter historical data api coming soon, you can signup to be notified when its available here.
This may not have existed when you first asked the question but the "PeopleBrowsr" API is perfect for this and you can go back 1400 days with a single API call: https://developer.peoplebrowsr.com/pb
Hope that helps!