Given the following tables:
Recipes
| id | name
| 1 | 'chocolate cream pie'
| 2 | 'banana cream pie'
| 3 | 'chocolate banana surprise'
Ingredients
| id | name
| 1 | 'banana'
| 2 | 'cream'
| 3 | 'chocolate'
RecipeIngredients
| recipe_id | ingredient_id
| 1 | 2
| 1 | 3
| 2 | 1
| 2 | 2
| 3 | 1
| 3 | 3
How do I construct a SQL query to find recipes where ingredients.name = 'chocolate' and ingredients.name = 'cream'?
This is called relational division. A variety of techniques are discussed here.
One alternative not yet given is the double NOT EXISTS
SELECT r.id, r.name
FROM Recipes r
WHERE NOT EXISTS (SELECT * FROM Ingredients i
WHERE name IN ('chocolate', 'cream')
AND NOT EXISTS
(SELECT * FROM RecipeIngredients ri
WHERE ri.recipe_id = r.id
AND ri.ingredient_id = i.id))
Use:
SELECT r.name
FROM RECIPES r
JOIN RECIPEINGREDIENTS ri ON ri.recipe_id = r.id
JOIN INGREDIENTS i ON i.id = ri.ingredient_id
AND i.name IN ('chocolate', 'cream')
GROUP BY r.name
HAVING COUNT(DISTINCT i.name) = 2
The key point here is that the count must equal the number of ingredient names. If it's not a distinct count, there's a risk of false positives due to duplicates.
If you're searching for multiple associations then the simplest way to write the query is to use multiple EXISTS
conditions instead of a single straight JOIN
.
SELECT r.id, r.name
FROM Recipes r
WHERE EXISTS
(
SELECT 1
FROM RecipeIngredients ri
INNER JOIN Ingredients i
ON i.id = ri.ingredient_id
WHERE ri.recipe_id = r.id
AND i.name = 'chocolate'
)
AND EXISTS
(
SELECT 1
FROM RecipeIngredients ri
INNER JOIN Ingredients i
ON i.id = ri.ingredient_id
WHERE ri.recipe_id = r.id
AND i.name = 'cream'
)
If you know for sure that the associations are unique (i.e. a single recipe can only have a single instance of each ingredient), then you can cheat a bit using a grouping subquery with a COUNT
function and possibly speed it up (performance will depend on the DBMS):
SELECT r.id, r.Name
FROM Recipes r
INNER JOIN RecipeIngredients ri
ON ri.recipe_id = r.id
INNER JOIN Ingredients i
ON i.id = ri.ingredient_id
WHERE i.name IN ('chocolate', 'cream')
GROUP BY r.id, r.Name
HAVING COUNT(*) = 2
Or, if a recipe might have multiple instances of the same ingredient (no UNIQUE
constraint on the RecipeIngredients
association table), you can replace the last line with:
HAVING COUNT(DISTINCT i.name) = 2
select r.*
from Recipes r
inner join (
select ri.recipe_id
from RecipeIngredients ri
inner join Ingredients i on ri.ingredient_id = i.id
where i.name in ('chocolate', 'cream')
group by ri.recipe_id
having count(distinct ri.ingredient_id) = 2
) rm on r.id = rm.recipe_id
SELECT DISTINCT r.id, r.name
FROM Recipes r
INNER JOIN RecipeIngredients ri ON
ri.recipe_id = r.id
INNER JOIN Ingredients i ON
i.id = ri.ingredient_id
WHERE
i.name IN ( 'cream', 'chocolate' )
Edited following comment, thanks! This is the right way then:
SELECT DISTINCT r.id, r.name
FROM Recipes r
INNER JOIN RecipeIngredients ri ON
ri.recipe_id = r.id
INNER JOIN Ingredients i ON
i.id = ri.ingredient_id AND
i.name = 'cream'
INNER JOIN Ingredients i2 ON
i2.id = ri.ingredient_id AND
i2.name = 'chocolate'
a different way:
Version 2 (as stored procedure) revised
select r.name
from recipes r
where r.id = (select t1.recipe_id
from RecipeIngredients t1 inner join
RecipeIngredients t2 on t1.recipe_id = t2.recipe_id
and t1.ingredient_id = @recipeId1
and t2.ingredient_id = @recipeId2)
Edit 2:
[before people start screaming] :)
This can be placed at the top of version 2, which will allow to query by name instead of passing in the id.
select @recipeId1 = recipe_id from Ingredients where name = @Ingredient1
select @recipeId2 = recipe_id from Ingredients where name = @Ingredient2
I've tested version 2, and it works. Most users where linking on the Ingredient table, in this case was totally not needed!
Edit 3: (test results);
When this stored procedure is run these are the results.
The results are of the format (First Recipe_id ; Second Recipe_id, Result)
1,1, Failed
1,2, 'banana cream pie'
1,3, 'chocolate banana surprise'
2,1, 'banana cream pie'
2,2, Failed
2,3, 'chocolate cream pie'
3,1, 'chocolate banana surprise'
3,2, 'chocolate cream pie'
3,3, Failed
Clearly this query does not handle case when both constraints are the same, but works for all other cases.
Edit 4:(handling same constraint case):
replacing this line:
r.id = (select t1...
to
r.id in (select t1...
works with the failed cases to give:
1,1, 'banana cream pie' and 'chocolate banana surprise'
2,2, 'chocolate cream pie' and 'banana cream pie'
3,3, 'chocolate cream pie' and 'chocolate banana surprise'