I'm new to MongoDB and as an exercise I'm building an application that extracts links from tweets. The idea is to get the most tweeted articles for a subject. I having a hard time to design the schema for this application.
- The application harvest tweets and saves them
- The tweets are parsed for links
- The links are saved with additional information (title, excerpt, etc.)
- A tweet can contain more then one link
- A link can have many tweets
How do I:
- Save these collections, Embedded Document?
- Get the top ten links sorted by number of tweets they have?
- Get the most tweeted link for a specific date?
- Get the tweets for a link?
- Get the ten latests tweets?
I would love to get some input on this.
two general tips: 1.)don't be afraid to duplicate. It is often a good idea to store the same data differently formatted in different collections.
2.) if you want to sort and sum up stuff, it helps to keep count fields everywhere. mongodb's atomic update method together with upsert commands make it easy to count up and to add fields to existing documents.
The following is most certainly flawed because it's typed from the top of my head. But better bad examples than no examples I thought ;)
add new tweet:
Get the top ten links sorted by number of tweets they have?
Get the most tweeted link for a specific date?
Get the tweets for a link?
Get the ten latests tweets?