Every time i contemplate using NoSQL for a solution i always get hung up on the lack of rich querying functionality. I think it very well be my lack of understanding of NoSQL. It also might be due to the fact of i'm comfortable very comfortable with SQL. From my understanding NoSQL really lends itself well for simple schema scenarios (so its probably not going to work well for a relational database where you have 50+ tables). Even for trivial scenarios i always seem to want rich query functionality. Lets take a recipe database as a trivial example.
While the scheme, is no doubt, trivial you would definitely want rich querying ability. You would probably want to search by the following (and more):
- Title
- Tag
- Category
- id
- Likes
- User who created recipe
- create date
- rating
- dietary restrictions
You would also want to combine these criteria into any combination you wanted to. While i know most NoSQL solutions have secondary indexes doesn't this type of querying ability severely limit how many solutions NoSQL is relevant for? I usually need this rich querying ability. Another good example would be a bug tracking application.
I don't think you want to kick off a map reduce job every time wants to search the database (i think this would be analogous to doing table scans most of the time in a traditional relational model). So i would assume there would be a lot of queries where you would have to loop through each entity and look for the criteria you wanted to search for (which would probably be slow). I understand you can run nightly map reduce jobs to either analyze the data or to maybe normalize it into a typical relational database structure for reports.
Now i can see it being useful for scenarios where you would most likely always have to read all the data anyways. Think of a web server log or maybe an IoT type of app where your collecting massive amounts of data (like censor collection) and doing nightly analysis.
So is understanding of NoSQL off or is there a limit to the # of scenarios that i works well with?
I think the issue you are encountering is that you are approaching noSQL with the same mindset of design that you would with SQL. You mentioned "rich querying" several times. To me, that points towards design flaws (using only reference ids/trying to define relationships). A significant concept in noSQL is that data can be repeated (and often should be). Your recipe example is actually a great use cases for noSQL. Here's how I would approach it using 3 of the models you mention (for simplicity sake):
Recipe = {
_id: a001,
name: "Burger",
ingredients: [
{
_id: b001,
name: "Beef"
},
{
_id: b002,
name: "Cheese"
}
],
createdBy: {
_id: c001,
firstName: "John",
lastName: "Doe"
}
}
Person = {
_id: c001,
firstName: "John",
lastName: "Doe",
email: "jd@email.com",
preferences: {
emailNotifactions: true
}
}
Ingredient = {
_id: b001,
name: "Beef",
brand: "Agri-co",
shelfLife: "3 days",
calories: 300
};
The reason I designed it this way is expressly for the purpose of it's existence (assuming it's something like allrecipes.com). When searching/filtering recipes, you can filter by the author, but their email preferences are irrelevant. Similarly, the shelf life and brand of the ingredient are irrelevant. The schema is designed for the specific use-case, not just because your data needs to be saved. Now here are a few of your mentioned queries (mongo):
db.recipes.find({name: "Burger"});
db.recipes.find({ingredients: { $nin: ["Cheese", "Milk"]}}) // dietary restrictions
Your rich querying concerns have now been reduced to single queries in a single collection.
The downside of this design is slower write speed. You need more logic on the backend, with the potential for more programmer error. The write speed is also slower than SQL due to accessing the various models to grab relevant information. That being said, how often is it viewed vs. how often is it written/edited? (this was my comment on reading trumping writing) The other major downside is the necessity of foresight. The relationship between an ingredient and a recipe doesn't change forms. But the information your application requires might. Editing a noSQL model tends to be more difficult than editing a SQL table.
Here's one other contrived example using the same models to emphasize my point about purposeful design. Assume your new site is on famous chefs instead of a recipe database:
Person = {
_id: c001,
firstName: "Paula",
lastName: "Deen",
recipeCount: 15,
commonIngredients: [
{
_id: b001,
name: "Butter",
count: 15
},
{
_id: b002,
name: "Salted Butter",
count: 15
}
],
favoriteRecipes: [
{
_id: a001,
name: "Fried Butter",
calories: "3000"
}
]
};
Recipe = {
_id: a001,
name: "Fried Butter",
ingredients: [
{
_id: b001,
name: "Butter"
}
],
directions: "Fry butter. Eat.",
calories: "3000",
rating: 99,
createdBy: {
_id: c001,
firstName: "Paula",
lastName: "Deen"
}
};
Ingredient = {
_id: b001,
name: "Butter",
brand: "Butterfields",
shelfLife: "1 month"
};
Both of these designs use the same information, but they are modeled for the specific reason you bothered gathering the information. Now, you have the requisite information for a chef list page and typical sorting/filtering. You can navigate from there to a recipe page and have that info available.
Design for the use case, not to model relationships.