How to extract the historical tweets from twitter

2019-09-22 04:30发布

问题:

We need the historical tweets for some movies. Right now, we tried the streaming API and search API from twitter. The streaming API could not give us a parameter to choose the time range we need and the search API could only give us data one or two weeks in advance. Is there a way for us to extract the historical tweets from 2014-05-01 to 2014-07-01 (For example)? I found the following ways that is possible to do that:

1: Twitter advanced search https://twitter.com/search-advanced?lang=en It could find the search result I need. But how could I download the search result? Is there anyway to write a code and save the search result?

2:Using the twitter analytic website like topsy. But it also has the difficulty to save the result.

3: It seems that some packages like twitter4J could help with that: http://twitter4j.org/en/code-examples.html Is there any python or R package could help us to do that?

4: We need this data to do a research. It is not a good choice to spend a long time for the extraction of the data. Is there anyway to buy this data from some professional website?

回答1:

You can use the following library https://github.com/Jefferson-Henrique/GetOldTweets-python to get old tweets in Twitter.

Make GetOldTweets-python as the current directory in Python using

 cd GetOldTweets-python

then do the following

In Python,

import got
tweetCriteria = got.manager.TweetCriteria().setQuerySearch('search_term').setSince("2014-05-01").setUntil("2014-07-01").setMaxTweets(10000)
tweet = got.manager.TweetManager.getTweets(tweetCriteria)[0]

print tweet.text

In terminal,

python Exporter.py --querysearch 'search_term' --since 2014-05-01 --until 2014-07-01 --maxtweets 10000

Replace 'search_term' with the search term.



回答2:

You can use Gnip's Historical Powertrack product to do this, however this is a commercial product aimed at enterprises rather than researchers.

Scraping the Twitter website is against the Terms of Service and Developer Policy.

The public search API only supports 7-9 days of data, so even using twitteR or tweepy (R and Python options) would not enable you to retrieve data from the period you are trying to access.



回答3:

Few months back Twitter introduced its premium api through which you can extract historical Twitter data from 2006 till today. They have made it very easy for a developer to buy Twitter data.

Here is the link to check this out: https://developer.twitter.com/en/premium-apis.html

To get the access of Twitter premium api, you will need to have an access to Twitter developers' account.

If you are not technically sound and you want an easiest to get Historical Twitter data then you can go for third party services such as TrackMyHashtag, Sifter, Gnip, Infegy.