Is there a way to select rows where one of the column contains only , but any number of, predefined values?
I've been using this, but it returns any rows where my column contains at least one of the values (which is exactly what it's suppose to do, I know).
But I'm looking for a way to only select rows that have ONLY my keywords in the keyword column.
SELECT *
FROM
`products`.`product`
WHERE
keywords LIKE '%chocolate%'
AND keyword LIKE '%vanilla%';
Example Keywords: chocolate, sugar, milk, oats
Using the above keywords, I would want the first two results returned, but not the last two:
Product1: chocolate, sugar
Product2: chocolate
Product3: chocolate, sugar, milk, oats, bran
Product4: chocolate, sugar, salt
My column contains a comma separated list of all keywords applicable to that product row.
Since you are storing the list as a string containing a comma separated list, rather than as a set, MySQL isn't going to be able to help much with that. When it was inserted into the database, MySQL saw it as a single string. When it's retrieved from the database, MySQL sees it as a single string. When we refer to it in a query, MySQL sees it as a single string.
If the "list" was stored as a standard relational set, with each keyword for a product stored as a separate row in the table, then returning the result set you specified is almost trivial.
For example, if we had this table:
CREATE TABLE product_keyword
product_id BIGINT UNSIGNED COMMENT 'FK ref products.id'
keyword VARCHAR(20)
With each keyword associated to a particular product as a separate row:
product_id keyword
---------- ---------
1 chocolate
1 sugar
2 chocolate
3 bran
3 chocolate
3 milk
3 oats
3 sugar
4 chocolate
4 salt
4 sugar
Then to find all rows in product
that have a keyword other than 'chocolate'
or 'vanilla'
SELECT p.id
FROM product p
JOIN product_keyword k
WHERE k.product_id = p.id
ON k.keyword NOT IN ('chocolate','vanilla')
GROUP BY p.id
--or--
SELECT p.id
FROM product p
LEFT
JOIN ( SELECT j.id
FROM product_keyword j
WHERE j.keyword NOT IN ('chocolate','vanilla')
GROUP BY j.id
) k
ON k.id = p.id
WHERE k.id IS NULL
To get products that have at least one of the keywords 'chocolate' and 'vanilla', but that have no other keywords associated, it's the same query above, but with an additional join:
SELECT p.id
FROM product p
JOIN ( SELECT g.id
FROM product_keyword g
WHERE g.keyword IN ('chocolate','vanilla')
GROUP BY g.id
) h
ON h.id = p.id
LEFT
JOIN ( SELECT j.id
FROM product_keyword j
WHERE j.keyword NOT IN ('chocolate','vanilla')
GROUP BY j.id
) k
ON k.id = p.id
WHERE k.id IS NULL
We can unpack those queries, they aren't difficult. Query h
returns a list of product_id that have at least one of the keywords, query k
returns a list of product_id that have some keyword other than those specified. The "trick" there (if you want to call it that) is the anti-join pattern... doing an outer join to match rows, and include rows that didn't have a match, and a predicate in the WHERE clause that eliminates rows that had a match, leaving the set of rows from product that didn't have a match.
But with the set stored as a "comma separated list" in a single character column, we lose all the advantages of relational algebra; there isn't any easy way to process the list of keywords as a "set".
With the entire list stored as a single string, we've got some horrendous SQL to get the specified result.
One approach to doing the kind of check you specify would be to create a set of all possible "matches", and check those. This is workable for a couple of keywords. For example, to get a list of products that have ONLY the keywords 'vanilla'
and/or 'chocolate'
, (that is, that have at least one of those keywords and does not have any other keyword):
SELECT p.id
FROM product
WHERE keyword_list = 'chocolate'
OR keyword_list = 'vanilla'
OR keyword_list = 'chocolate,vanilla'
OR keyword_list = 'vanilla,chocolate'
But extending that to three, four or five keywords quickly becomes unwieldy (unless the keywords are guaranteed to appear in a particular order. And it's very difficult to check for three out of four keywords.
Another (ugly) approach is to transform the keyword_list
into a set, so that we can use queries like the first ones in my answer. But the SQL to do the transformation is limited by an arbitrary maximum number of keywords that can be extracted from the keyword_list.
It's fairly easy to extract the nth element from a comma separated list, using some simple SQL string functions, for example, to extract the first five elements from a comma separated list:
SET @l := 'chocolate,sugar,bran,oats'
SELECT NULLIF(SUBSTRING_INDEX(CONCAT(@l,','),',',1),'') AS kw1
, NULLIF(SUBSTRING_INDEX(SUBSTRING_INDEX(CONCAT(@l,','),',',2),',',-1),'') AS kw2
, NULLIF(SUBSTRING_INDEX(SUBSTRING_INDEX(CONCAT(@l,','),',',3),',',-1),'') AS kw3
, NULLIF(SUBSTRING_INDEX(SUBSTRING_INDEX(CONCAT(@l,','),',',4),',',-1),'') AS kw4
, NULLIF(SUBSTRING_INDEX(SUBSTRING_INDEX(CONCAT(@l,','),',',5),',',-1),'') AS kw5
But those are still on the same row. If we want to do checks on those, we'd have a bit of comparing to do, we'd need to check each one of those to see if it was in the specified list.
If we can get those keywords, on the one row, transformed into a set of rows with one keyword on each row, then we could use queries like the first ones in my answer. As an example:
SELECT t.product_id
, NULLIF(CASE n.i
WHEN 1 THEN SUBSTRING_INDEX(CONCAT(t.l,','),',',1)
WHEN 2 THEN SUBSTRING_INDEX(SUBSTRING_INDEX(CONCAT(t.l,','),',',2),',',-1)
WHEN 3 THEN SUBSTRING_INDEX(SUBSTRING_INDEX(CONCAT(t.l,','),',',3),',',-1)
WHEN 4 THEN SUBSTRING_INDEX(SUBSTRING_INDEX(CONCAT(t.l,','),',',4),',',-1)
WHEN 5 THEN SUBSTRING_INDEX(SUBSTRING_INDEX(CONCAT(t.l,','),',',5),',',-1)
END,'') AS kw
FROM ( SELECT 4 AS product_id,'fee,fi,fo,fum' AS l
UNION ALL
SELECT 5, 'coffee,sugar,milk'
) t
CROSS
JOIN ( SELECT 1 AS i
UNION ALL SELECT 2
UNION ALL SELECT 3
UNION ALL SELECT 4
UNION ALL SELECT 5
) n
HAVING kw IS NOT NULL
ORDER BY t.product_id, n.i
That gets us individual rows, but it's limited to a row for each of the first 5 keywords. It's easy to see how that would be extended (having n return 6,7,8,...) and extending the WHEN conditions in the CASE to handle 6,7,8...
But there is going to be some arbitrary limit. (I've used an inline view, aliased as t
, to return two "example" rows, as a demonstration. That inline view could be replaced with a reference to the table containing the product_id and keyword_list columns.)
So, that query gets us a rowset like would be returned from the product_keyword
table I gave as an example above.
In the example queries, references to the product_keyword
table could be replaced with this query. But that's a whole lot of ugly SQL, and its horrendously inefficient, creating and populating temporary MyISAM tables anytime a query is run.