What's the most 'mongo' way of representing many-to-many relationships that have attributes?
So for example:
Intro
MYSQL tables
people
=> firstName, lastName, ...
Movies
=> name, length ..
peopleMovies
=> movieId, personId, language, role
Solution 1
Embed people into movies...?
In MongoDB I understand it's good to denormalize and embed
but I don't want to embed
people into movies, it just doesn't logically make any sense. Because people don't necessarily only have to belongs to movies.
Solution 2
People
and Movies
will be two separate collections.
People
=> embed [{movieId: 12, personId: 1, language: "English", role: "Main"} ...]
Movies
=> embed [{movieId: 12, personId: 1, language: "English", role: "Main"} ...]
The issue with this solution is that when we want to update a person's role
for a specific movie
we'll need to run two update queries to ensure data is in sync in both collections.
Solution 3
We can also do something much more relational like and end up with three collections
People
=> firstName, lastName, ...
Movies
=> name, length ..
Castings
=> movieId, personId, language, role
The issue with this is that because of the lack of a join statement in MongoDB, it would take 3 queries
to go from people -> movies and vice versa.
Here is my question, what are some other ways to model something like this in MongoDB
and in a more NoSQL
way. In terms of the solutions provided, which one would be the best in terms of performance and convention in mongo.
I think you should denormalize your collections. The important point when designing MongoDB collections and documents is to think about your views. What data do you need to display your view? The idea is that you should try to have that data be part of your document.
For example, in your case, you probably have a view for
Movies
where you want to display information about a movie. But that page about a movie probably needs just basic information about each person (first name, last name, photo URL). Not all other things. And vice-versa, the page about a person will probably list all movies, but also again only a subset of information about each movie is needed, like title, year, and poster photo URL.So one option would be to have two collections, but then embed (denormalize) just those few fields you need between collections. So for example,
Movies
collection would have a fieldpeople
which would be an array of subdocuments. And thePeople
collection would havemovies
field which would be an array of subdocuments, with those extra fields you want to specify role and so on.So documents might be something like the following. For movies:
For people:
Of course, the question is how to keep those fields in sync. What if you update the poster photo URL of a movie, you want it to be updated in all Person documents as well. To solve this problem, we developed PeerDB, a package to define relations between collections which then makes sure they are kept in sync.
So in your case, I would have such collections defined in PeerDB, in CoffeeScript:
In short, this definition says that
people.person
field should be a reference toPeople
collection and kept in sync forfirstName
,lastName
,photoURL
. Additionally, a reverse reference field should be made inPeople
documents under the fieldmovies
withtitle
,year
,posterURL
.Pretty simple. But there are some downsides. The arrays could get very big (maybe not in the case of movies and people, but for some other data) which could make documents too big for MongoDB per-document limits (at the moment 16 MB). Additionally, if you observe, you will see that for
People
documents there is no information about the role in the list of movies. This is because the role is not part of the referenced document, but it is something which is next to the reference. What if you would want to display role for movie a person was in on the person page/view?So, maybe it would be better to have three collections, one for basic information about movies, another for people, and then a collection for relation between people and movies. So data could be maybe something like, for movies:
For people:
For casting:
And PeerDB definitions:
PeerDB would then make sure that things are kept in sync. It would also remove casting document if a movie or a person is deleted from the database.
This then allows you to make a Meteor publish which is efficient and does not require any dynamic building of related queries. You simply publish
Casting
collection and this is it. You can even query on some condition. For example, you want to display all directors sorted byfirstName
andlastName
and their movies? Possible with only one query.In many ways meteor's API encourages flat relational documents, however MongoDB is a non-relational data store. This conflict is, unfortunately, left as an exercise for the developer to solve.
The notion of schema structure and joins is an enormous topic to cover within a single answer, so I will attempt to be as succinct as possible.
Reasons why you should choose a relational model
Assume you have comment and post data. Consider what would happen if you embedded comments within your posts.
DDP operates on documents. All of the comments will be sent every time a new comment in the same post is added.
allow
anddeny
rules operate on documents. It may be unreasonable to expect that the same rules apply simultaneously to both posts and comments.Publications tend to make more sense in terms of collections. In the above scenario, we could not easily publish a list of comments independent of their posts.
Relational databases exist for good reasons. One of them is to avoid the multiple modification problem inherent in your second solution.
Reasons why you should choose an embedded model
Recommendations
Use your third solution. In my experience, the reasons for choosing a relational model far outweigh the restrictions imposed by the data store. Of course overcoming the lack of joins isn't easy, but the pain is likely to be isolated to only a handful of publish functions. Here are some resources I'd highly recommend:
How to publish a many-to-many relationship on EventedMind. Chris covers your exact use case in detail, however he manually does the reactive join with observe callbacks, which I don't recommend.
Reactive joins in meteor from the Discover Meteor Encyclopedia. This covers the basics of how and why one should do a reactive join.
The denormalization chapter from Discover Meteor. This covers many of the points I made above and also talks about when and how to denormalize some of your data.
You can use Publish with relations to join your data. Alternative packages include: smart publish, publish composite, and simple publish.
If you need more information beyond this, please comment below and I will update my answer.