Recently I'm exploring NoSQL Databases. I need an advice about how to store data in the most optimal and efficient way for a given problem. I'm targeting MongoDB, now. However it should be the same with CouchDB.
Let's say we have these 3 Models:
Story:
id
title
User:
id
name
Vote:
id
story_id
user_id
I want to be able to ask the database these questions:
- Who has voted for this Story?
- What this User has Voted for?
I'm doing simple joins while working with a relational DB. The question is, how should I store the data for those objects in order to be most efficient.
For example, if I store the Vote objects as a subcollection of Stories it wont be easy to get the info - "What a user has voted for".
I would suggest storing votes as a list of story _id
s in each user. That way you can find out what stories a user has voted for just by looking at the list. To get the users who have voted for a story you can do something like:
db.users.find({stories: story_id})
where story_id
is the _id
of the story in question. If you create an index on the stories
field both of those queries will be fast.
- don't worry if your queries are efficient until it starts to matter
- according to below quote, you're doing it wrong
The way I have been going about the
mind switch is to forget about the
database alltogether. In the
relational db world you always have to
worry about data normalization and
your table structure. Ditch it all.
Just layout your web page. Lay them
all out. Now look at them. Your
already 2/3 there. If you forget the
notion that database size matters and
data shouldn't be duplicated than your
3/4 there and you didnt even have to
write any code! Let your views dictate
your Models. You don't have to take
your objects and make them 2
dimensional anymore as in the
relational world. You can store
objects with shape now.
how-to-think-in-data-stores-instead-of-databases
Ok, you haven given a normalized data model as you would do in an SQL setup.
In my understanding you don't do this in MongoDB. You could store references, but you do not for performance reasons in the general case.
I'm not an expert in the NoSQL area in no way, but why don't you simply follow your needs and store the user (ids) that have voted for a story in the stories collection and the story (ids) a user has voted for in the users collection?
In CouchDB this is very simple. One view emits:
function(doc) {
if(doc.type == "vote") {
emit(doc.story_id, doc.user_id);
}
}
Another view emits:
function(doc) {
if(doc.type == "vote") {
emit(doc.user_id, doc.story_id);
}
}
Both are queries extremely fast since there is no join. If you do need user data or story data, CouchDB supports multi-document fetch. Also quite fast and is one way to do a "join".
I've been looking into MongoDB and CouchDB a lot lately, but my insight is limited. Still, when thinking about storing the votes inside the story document, you might have to worry about hitting the 4MB document size limit. Even if you don't, you might be constantly increasing the size of the document enough to cause it to get moved and thus slowing down your writes (see how documents are sized in MongoDB).
As for CouchDB, these kinds of things are quite simple, elegant, and quite fast once the view indexes are calculated. Personally, however, I have hesitated to do a similar project in CouchDB because of benchmarks showing it progressively slowing down to a considerable degree as the database grows (and the view indexes grow). I'd love to see some more recent benchmarks showing CouchDB performance as database size increases. I WANT to try MongoDB or CouchDB, but SQL still seems so efficient and logical, so I'll stay with it until the project fits the temptation just right.