SQL: Get Products from a category but also must be

2020-02-07 03:03发布

问题:

I am currently stuck in a situation. The scenario is this. I have products who may be associated with multiple categories. The data structure is shown below:

Products Table:
product_id  name 
1           Lemon
2           Kiwis
3           Cheese

Product to Categories Table

product_id   category_id
1            1
1            2
1            3
2            1
2            3
3            2
3            4

Category Table (not required in query however adding it here to help visualize what is happening)

category_id  name
1            Fruit
2            Yellow
3            Round
4            Dairy

What I'm struggling with here is that originally I want to get all products that are in the fruit category (category id 1) but I also want to check if a fruit is yellow. Keep in mind that yellow will not be the only filter, sometimes I will want to return yellow and orange fruit, however since cheese is yellow I can't return it since it isn't a fruit. However to make things a bit easier I always know that I am going to look in the fruit category as a base.

The database structure can not change as its an opencart database structure.

Here are my attempts:

SELECT GROUP_CONCAT(DISTINCT p2c2.category_id SEPARATOR ',') as categories 
FROM oc_product_to_category p2c 
LEFT JOIN oc_product p ON (p.product_id = p2c.product_id) 
LEFT JOIN oc_product_to_category p2c2 ON (p.product_id = p2c2.product_id) 
WHERE p2c.category_id IN ('1','2')

This kind of works except for that fact that it will return Cheese.

Notes: I use Group Concat because at the end of all of this my goal is to return not so much the products that match these categories but based on the filters I want to return another list of categories from the products that match this criteria. So:

Scenario:

Get Products that match category criteria Return categories of those products.

Any assistance will be greatly appreciated.

回答1:

This type of problem is called relational division.

There are two common solutions:

  1. First solution strings together the matching categories and compares to a fixed string:

    SELECT p2c.product_id
    FROM oc_product_to_category p2c
    GROUP BY p2c.product_id
    HAVING GROUP_CONCAT(p2c.category_id SEPARATOR ',' ORDER BY p2c.category_id) = '1,2'
    
  2. Second solution does a JOIN for each required value:

    SELECT p.product_id 
    FROM oc_product p 
    INNER JOIN oc_product_to_category p2c1 
      ON (p.product_id = p2c1.product_id AND p2c1.category_id = 1) 
    INNER JOIN oc_product_to_category p2c2 
      ON (p.product_id = p2c2.product_id AND p2c2.category_id = 2) 
    

I cover these solutions in my presentation SQL Query Patterns, Optimized. I found in my tests that the join solution is much better for performance.


@Tom's suggestion is right, here's what that would look like in a complete query:

    SELECT p.product_id, GROUP_CONCAT(p2c3.category_id SEPARATOR ',') AS categories
    FROM oc_product p 
    INNER JOIN oc_product_to_category p2c1 
      ON (p.product_id = p2c1.product_id AND p2c1.category_id = 1) 
    INNER JOIN oc_product_to_category p2c2 
      ON (p.product_id = p2c2.product_id AND p2c2.category_id = 2) 
    INNER JOIN oc_product_to_category p2c3
      ON (p.product_id = p2c3.product_id)
    GROUP BY p.product_id;

The DISTINCT that @Tom suggests shouldn't be necessary, because your p2c table should have a UNIQUE constraint over (product_id, category_id).