I thought a query like this would be pretty easy because of the nature of relational databases but it seems to be giving me a fit. I also searched around but found nothing that really helped. Here's the situation:
Let's say I have a simple relationship for products and product tags. This is a one-to-many relationship, so we could have the following:
productid | tag
========================
1 | Car
1 | Black
1 | Ford
2 | Car
2 | Red
2 | Ford
3 | Car
3 | Black
3 | Lexus
4 | Motorcycle
4 | Black
5 | Skateboard
5 | Black
6 | Skateboard
6 | Green
What's the most efficient way to query for all (Ford OR Black OR Skateboard) AND NOT (Motorcycles OR Green)
? Another query I'm going to need to do is something like all (Car) or (Skateboard) or (Green AND Motorcycle) or (Red AND Motorcycle)
.
There are about 150k records in the products table and 600k records in the tags tables, so the query is going to need to be as efficient as possible. Here's one query that I've been messing around with (example #1), but it seems to be taking about 4 seconds or so. Any help would be much appreciated.
SELECT p.productid
FROM products p
JOIN producttags tag1 USING (productid)
WHERE p.active = 1
AND tag1.tag IN ( 'Ford', 'Black', 'Skatebaord' )
AND p.productid NOT IN (SELECT productid
FROM producttags
WHERE tag IN ( 'Motorcycle', 'Green' ));
Update
The quickest query I've found so far is something like this. It's taking 100-200ms but it seems pretty inflexible and ugly. Basically I'm grabbing all products that match Ford
, Black
, or Skateboard
. Them I'm concatenating all of the tags for those matched products into a colon-separated string and removing all products that match on :Green:
AND :Motorcycle:
. Any thoughts?
SELECT p.productid,
Concat(':', Group_concat(alltags.tag SEPARATOR ':'), ':') AS taglist
FROM products p
JOIN producttags tag1 USING (productid)
JOIN producttags alltags USING (productid)
WHERE p.active = 1
AND tag1.tag IN ( 'Ford', 'Black', 'Skateboard' )
GROUP BY tag1.productid
HAVING ( taglist NOT LIKE '%:Motorcycle:%'
AND taglist NOT LIKE '%:Green:%' );
What about this one:
I would usually attack this by trying to eliminate records in the from...
I'd write the exclusion join with no subqueries:
Make sure you have an index on products over the two columns (active, productid) in that order.
You should also have an index on producttags over the two columns (productid, tag) in that order.
Another query I'm going to need to do is something like all (Car) or (Skateboard) or (Green AND Motorcycle) or (Red AND Motorcycle).
Sometimes these complex conditions are hard for the MySQL optimizer. One common workaround is to use UNION to combine simpler queries:
PS: Your tagging table is not an Entity-Attribute-Value table.
I would get all the unique ID matches and the unique IDs to filter out, then LEFT JOIN those lists (as per tigeryan) and filter out any IDs that match. The query should also be easier to read and modify by keeping all the queries separate. It should be fairly quick also, although it may not look like it.
Sometimes a JOIN is faster than an IN, depending on how mysql optimizes the query:
The second query should force the join order since the internal selects have to be done first.