Getting tweets by date with tweepy

2020-06-17 07:17发布

I pulled the max amount of tweets allowed from USATODAY which was 3000.

Now I want to create a script to automatically pull USATODAY's tweets at 11:59PM of every day.

I was going to use the stream api but then I will have to keep it running the whole day.

Can I get some insight on how to create a script where it runs the REST API every night at 11:59PM to pull the day's tweets? If not does anyone know how to pull tweets based on date?

I was thinking about placing an ifelse statement in my for loop but that seems inefficient, because it will have to search through 3000 tweets every night.

This is what I have now:

client = MongoClient('localhost', 27017)
db = client['twitter_db']
collection = db['usa_collection']
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token_key, access_token_secret)

api = tweepy.API(auth)

for tweet in tweepy.Cursor(api.user_timeline,id='USATODAY').items():
    collection.insert(tweet._json)

2条回答
够拽才男人
2楼-- · 2020-06-17 07:39

You can simply retrieve the tweets with the help of pages, Now on each page received you iterate over the tweets and extract the creation time of that tweet which is accessed using tweet.created_at and the you find the difference between the extracted date and the current date, if the difference is less than 1 day then it is a favourable tweet else you just exit out of the loop.

import tweepy, datetime, time

def get_tweets(api, username):
    page = 1
    deadend = False
    while True:
        tweets = api.user_timeline(username, page = page)

        for tweet in tweets:
            if (datetime.datetime.now() - tweet.created_at).days < 1:
                #Do processing here:

                print tweet.text.encode("utf-8")
            else:
                deadend = True
                return
        if not deadend:
            page+=1
            time.sleep(500)

get_tweets(api, "anmoluppal366")

Note: you are not accessing all 3000 tweets of that person, you only iterate over those tweets which were created within the span of 24 hours at the time of launching your application.

查看更多
手持菜刀,她持情操
3楼-- · 2020-06-17 07:43

Other method:

def search(target, date, maxnum = 10):
    cursor = tweepy.Cursor(
        api.search,
        q = target,
        since = date[0],
        until = date[1],
        show_user = True)

    return cursor.items(maxnum)

if __name__ == '__main__':
    list_tweets = search(
    target = '서지수',
    date = ('2016-05-01', '2016-05-25'),
    maxnum = 100)
    print(list_tweets)
查看更多
登录 后发表回答