Getting historical data from Twitter [closed]

2019-01-11 20:41发布

问题:

For a research project I would like to get the last 3 months worth of Twitter messages. Technical challenges aside, is this possible? by using some sort of slow polling mechanism to keep the rate limiter at bay?

The Twitter API states "Clients may request up to 3,200 statuses via the page and count parameters for timeline REST API" Are these per hour? Per day? or...ever?

Any suggestions? Would it even be theoretically possible? Did some one do something similar before?

Thanks! Marco

回答1:

Twitter notoriously does not make "available" tweets older than three weeks. In some cases you can only get one week. You're better off storing tweets for the next three months. Many rightly doubt if they're even persisted by Twitter.

Are you looking for just any tweets? If so, check out the Streaming API's status/sample method. The streaming API uses persistent HTTP sockets that can be a pain to program, but it's quite graceful when you get it working. I'd recommend setting up a little script to dump tweets from status/sample into a DB. You should have a TON of data after just a few days.



回答2:

You could use the Search API, don't give it a search, return the maximum of 100 per page, then got through each page twice a minute(120 times an hour - 30 times less than the rate limit). However, if my math is correct, that could possibly give you 720,000 tweets an hour..... the problem is that Twitter has added approximately 1.75 billion tweets over the past 3 months. So if my math is correct, it would take you 2361 days, or 6 years to complete this.

You could ask this question over on the Twitter Development talk on Google Groups, or contact Twitter to get white-listed so you could make up to 20,000 requests an hour.

Personally, I don't think it's possible.



回答3:

DataSift claims to have a twitter historical data api coming soon, you can signup to be notified when its available here.



回答4:

This may not have existed when you first asked the question but the "PeopleBrowsr" API is perfect for this and you can go back 1400 days with a single API call: https://developer.peoplebrowsr.com/pb

Hope that helps!



回答5:

Keyhole can get you historical tweets in xls or present them in a visual dashboard. The preview samples only a few most recent tweets, however, you can request historical data if you email them.

See: http://keyhole.co/conversation_tracking



回答6:

You can read the twitter historic data using Gnip's Historic PowerTrack tool. It will give you access to all twitter data since first tweet and fairly it is very simple tool t use.



回答7:

You can get free estimates for the data scope and cost using a service built by my company called Sifter. If you decide to purchase access to the data it will be available via our text analytics platform DiscoverText, where you can search, filter, de-duplicate, cluster, human code, and machine-classify the data.