MySQL Enforce Unique constraint across a Combinati

2020-04-16 03:35发布

问题:

I am refining a Search Auto-suggestion system, and after a few iterations and Normalization, have the following set of tables:

DB Fiddle: https://www.db-fiddle.com/f/b1FvGDkBMQXkREaMh3pHKi/0

Table 1: keywords - It stores a list of alphanumeric (relevant) keywords. Relevant details are:

CREATE TABLE keywords
  (
     keyword_id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
     keyword    VARCHAR(32) NOT NULL
    /* Other fields eg: ranking, weights, etc */
  ) engine = innodb;

| keyword_id | keyword |
| ---------- | ------- |
| 1          | cotton  |
| 2          | jeans   |
| 3          | suit    |

Table 2: phrases - It represents a specific phrase. Earlier, I used to store the complete phrase text (with a UNIQUE constraint on the phrase), in this table. However, due to certain application related issues, data inconsistency happened. Also, this was duplication of data, hence I normalized it further:

CREATE TABLE phrases
  (
     phrase_id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY
    /* Other fields eg: ranking, weights, category details etc */
  ) engine = innodb;

| phrase_id |
| --------- |
| 1         | /* Earlier, a column phrase_text = "cotton jeans" existed as well */
| 2         | /* Earlier, a column phrase_text = "cotton suit" existed as well */
| 3         | /* This also represents "cotton jeans". Cant enforce uniqueness*/

Table 3: keywords_to_phrases - It is a mapping table; a phrase is single-space separated concatenation of the keywords, sorted by their position. Table details:

CREATE TABLE keywords_to_phrases
  (
    phrase_id INT UNSIGNED NOT NULL,
    keyword_id INT UNSIGNED NOT NULL,
    position TINYINT UNSIGNED NOT NULL DEFAULT 0,
    PRIMARY KEY (phrase_id, position),
    FOREIGN KEY (phrase_id) REFERENCES phrases(phrase_id),
    FOREIGN KEY (keyword_id) REFERENCES keywords(keyword_id)
  ) engine = innodb;

| phrase_id | keyword_id | position |
| --------- | ---------- | -------- |
| 1         | 1          | 0        |
| 1         | 2          | 1        |
| 2         | 1          | 0        |
| 2         | 3          | 1        |
| 3         | 1          | 0        |
| 3         | 2          | 1        |

So, in order to get a phrase text, the query would be:

SELECT p.phrase_id, 
       GROUP_CONCAT(k.keyword ORDER BY kp.position ASC
                    SEPARATOR ' ') AS phrase_text
FROM phrases p 
JOIN keywords_to_phrases kp ON kp.phrase_id = p.phrase_id 
JOIN keywords k ON k.keyword_id = kp.keyword_id 
GROUP BY p.phrase_id;

| phrase_id | phrase_text  |
| --------- | ------------ |
| 1         | cotton jeans |
| 2         | cotton suit  |
| 3         | cotton jeans |  /* Duplicate phrase text - cant enforce uniqueness */
  • The Primary Key on (phrase_id, position), ensures that a position is not repeated for the phrase.
  • This design also allows to avoid duplication of keyword(s) in a phrase. It can be achieved by UNIQUE constraint on (phrase_id, keyword_id).

However, I am unable to enforce the uniqueness of a phrase text. How can I ensure that no other phrase_id exists which will have the exact same set of (keyword_id, position) rows.

In the example above: phrase_id = 1 and 3 are duplicates. I could store them as JSON instead, and apply UNIQUE constraint on the JSON. But then, I am unable to index the JSON properly, and cannot utilize the Foreign Key constraint on keyword_id.

So, the question is: In MySQL, is there a way (smart trick) to achieve this unique constraint across the combination of rows (other than triggers), in the table design itself ?

I am also open to better table structure suggestions (if any; maybe I have over-normalized this !).