I am refining a Search Auto-suggestion system, and after a few iterations and Normalization, have the following set of tables:
DB Fiddle: https://www.db-fiddle.com/f/b1FvGDkBMQXkREaMh3pHKi/0
Table 1: keywords - It stores a list of alphanumeric (relevant) keywords. Relevant details are:
CREATE TABLE keywords
(
keyword_id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
keyword VARCHAR(32) NOT NULL
/* Other fields eg: ranking, weights, etc */
) engine = innodb;
| keyword_id | keyword |
| ---------- | ------- |
| 1 | cotton |
| 2 | jeans |
| 3 | suit |
Table 2: phrases - It represents a specific phrase. Earlier, I used to store the complete phrase text (with a UNIQUE constraint on the phrase
), in this table. However, due to certain application related issues, data inconsistency happened. Also, this was duplication of data, hence I normalized it further:
CREATE TABLE phrases
(
phrase_id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY
/* Other fields eg: ranking, weights, category details etc */
) engine = innodb;
| phrase_id |
| --------- |
| 1 | /* Earlier, a column phrase_text = "cotton jeans" existed as well */
| 2 | /* Earlier, a column phrase_text = "cotton suit" existed as well */
| 3 | /* This also represents "cotton jeans". Cant enforce uniqueness*/
Table 3: keywords_to_phrases - It is a mapping table; a phrase is single-space separated concatenation of the keywords, sorted by their position
. Table details:
CREATE TABLE keywords_to_phrases
(
phrase_id INT UNSIGNED NOT NULL,
keyword_id INT UNSIGNED NOT NULL,
position TINYINT UNSIGNED NOT NULL DEFAULT 0,
PRIMARY KEY (phrase_id, position),
FOREIGN KEY (phrase_id) REFERENCES phrases(phrase_id),
FOREIGN KEY (keyword_id) REFERENCES keywords(keyword_id)
) engine = innodb;
| phrase_id | keyword_id | position |
| --------- | ---------- | -------- |
| 1 | 1 | 0 |
| 1 | 2 | 1 |
| 2 | 1 | 0 |
| 2 | 3 | 1 |
| 3 | 1 | 0 |
| 3 | 2 | 1 |
So, in order to get a phrase text, the query would be:
SELECT p.phrase_id,
GROUP_CONCAT(k.keyword ORDER BY kp.position ASC
SEPARATOR ' ') AS phrase_text
FROM phrases p
JOIN keywords_to_phrases kp ON kp.phrase_id = p.phrase_id
JOIN keywords k ON k.keyword_id = kp.keyword_id
GROUP BY p.phrase_id;
| phrase_id | phrase_text |
| --------- | ------------ |
| 1 | cotton jeans |
| 2 | cotton suit |
| 3 | cotton jeans | /* Duplicate phrase text - cant enforce uniqueness */
- The Primary Key on
(phrase_id, position)
, ensures that aposition
is not repeated for thephrase
. - This design also allows to avoid duplication of keyword(s) in a phrase. It can be achieved by
UNIQUE
constraint on(phrase_id, keyword_id)
.
However, I am unable to enforce the uniqueness of a phrase text. How can I ensure that no other phrase_id
exists which will have the exact same set of (keyword_id, position)
rows.
In the example above: phrase_id = 1 and 3
are duplicates. I could store them as JSON instead, and apply UNIQUE constraint on the JSON. But then, I am unable to index the JSON properly, and cannot utilize the Foreign Key constraint on keyword_id
.
So, the question is: In MySQL, is there a way (smart trick) to achieve this unique constraint across the combination of rows (other than triggers), in the table design itself ?
I am also open to better table structure suggestions (if any; maybe I have over-normalized this !).