How do I design a MongoDB schema for a Twitter art

I'm new to MongoDB and as an exercise I'm building an application that extracts links from tweets. The idea is to get the most tweeted articles for a subject. I having a hard time to design the schema for this application.

The application harvest tweets and saves them
The tweets are parsed for links
The links are saved with additional information (title, excerpt, etc.)
A tweet can contain more then one link
A link can have many tweets

How do I:

Save these collections, Embedded Document?
Get the top ten links sorted by number of tweets they have?
Get the most tweeted link for a specific date?
Get the tweets for a link?
Get the ten latests tweets?

I would love to get some input on this.

标签： mongodb schema

1条回答

做个烂人

2楼-- · 2019-05-11 00:22

two general tips: 1.)don't be afraid to duplicate. It is often a good idea to store the same data differently formatted in different collections.

2.) if you want to sort and sum up stuff, it helps to keep count fields everywhere. mongodb's atomic update method together with upsert commands make it easy to count up and to add fields to existing documents.

The following is most certainly flawed because it's typed from the top of my head. But better bad examples than no examples I thought ;)

colletion tweets:

{
  tweetid: 123,
  timeTweeted: 123123234,  //exact time in milliseconds
  dayInMillis: 123412343,  //the day of the tweet kl 00:00:00
  text: 'a tweet with a http://lin.k and an http://u.rl',
  links: [
     'http://lin.k',
     'http://u.rl' 
  ],
  linkCount: 2
}

collection links: 

{
   url: 'http://lin.k'
   totalCount: 17,
   daycounts: {
      1232345543354: 5, //key: the day of the tweet kl 00:00:00
      1234123423442: 2,
      1234354534535: 10
   }
}

add new tweet:

db.x.tweets.insert({...}) //simply insert new document with all fields

//for each found link:
var upsert = true;
var toFind =  { url: '...'};
var updateObj = {'$inc': {'totalCount': 1, 'daycounts.12342342': 1 } }; //12342342 is the day of the tweet
db.x.links.update(toFind, updateObj, upsert);

Get the top ten links sorted by number of tweets they have?

db.x.links.find().sort({'totalCount:-1'}).limit(10);

Get the most tweeted link for a specific date?

db.x.links.find({'$gt':{'daycount.123413453':0}}).sort({'daycount.123413453':-1}).limit(1); //123413453 is the day you're after

Get the tweets for a link?

db.x.tweets.find({'links': 'http://lin.k'});

Get the ten latests tweets?

db.x.tweets.find().sort({'timeTweeted': -1}, -1).limit(10);

0人赞添加讨论(0) 举报

How do I design a MongoDB schema for a Twitter art

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间