Recently I'm exploring NoSQL Databases. I need an advice about how to store data in the most optimal and efficient way for a given problem. I'm targeting MongoDB, now. However it should be the same with CouchDB.
Let's say we have these 3 Models:
Story:
id
title
User:
id
name
Vote:
id
story_id
user_id
I want to be able to ask the database these questions:
- Who has voted for this Story?
- What this User has Voted for?
I'm doing simple joins while working with a relational DB. The question is, how should I store the data for those objects in order to be most efficient.
For example, if I store the Vote objects as a subcollection of Stories it wont be easy to get the info - "What a user has voted for".
In CouchDB this is very simple. One view emits:
Another view emits:
Both are queries extremely fast since there is no join. If you do need user data or story data, CouchDB supports multi-document fetch. Also quite fast and is one way to do a "join".
I've been looking into MongoDB and CouchDB a lot lately, but my insight is limited. Still, when thinking about storing the votes inside the story document, you might have to worry about hitting the 4MB document size limit. Even if you don't, you might be constantly increasing the size of the document enough to cause it to get moved and thus slowing down your writes (see how documents are sized in MongoDB).
As for CouchDB, these kinds of things are quite simple, elegant, and quite fast once the view indexes are calculated. Personally, however, I have hesitated to do a similar project in CouchDB because of benchmarks showing it progressively slowing down to a considerable degree as the database grows (and the view indexes grow). I'd love to see some more recent benchmarks showing CouchDB performance as database size increases. I WANT to try MongoDB or CouchDB, but SQL still seems so efficient and logical, so I'll stay with it until the project fits the temptation just right.
how-to-think-in-data-stores-instead-of-databases
I would suggest storing votes as a list of story
_id
s in each user. That way you can find out what stories a user has voted for just by looking at the list. To get the users who have voted for a story you can do something like:db.users.find({stories: story_id})
where
story_id
is the_id
of the story in question. If you create an index on thestories
field both of those queries will be fast.Ok, you haven given a normalized data model as you would do in an SQL setup.
In my understanding you don't do this in MongoDB. You could store references, but you do not for performance reasons in the general case.
I'm not an expert in the NoSQL area in no way, but why don't you simply follow your needs and store the user (ids) that have voted for a story in the stories collection and the story (ids) a user has voted for in the users collection?