I want to make a tags
column of type json
:
e.g.,
id | tags
=========================================
1 | '["tag1", "tag2", "tag3"]'
2 | '["tag1", "tag3", "tag5", "tag7"]'
3 | '["tag2", "tag5"]'
I want to index each tag
in the arrays, without knowing the length of the arrays (variable length).
So then if I query for rows that contain tag2
, it should return rows 1, 3.
https://dev.mysql.com/doc/refman/5.7/en/json.html
JSON columns cannot be indexed. You can work around this restriction by creating an index on a generated column that extracts a scalar value from the JSON column
By "extracts a scalar value", does this mean I must extract & index each item in the arrays individually (meaning I must know the maximum length of the array to index them all)? How do I index a variable length array?
It's not practical to index an array in JSON in MySQL.
You can use generated columns to extract each element of the array into a separate scalar column, and index each generated column. But how many of these columns will you need? How will you know which column contains the value you're searching for?
You can use a generated column as @bostaf's answer shows, extracting multiple array values and making a comma-separated string. You can't use an plain index to search this string for a word that may occur in the middle. Perhaps you could use a fulltext index, but this only works if the array elements are each a single word.
In April 2018, I gave a presentation about this kind of weakness using JSON in MySQL: How to Use JSON in MySQL Wrong.
The better solution for multi-valued attributes is to store them in a dependent table, in the manner proscribed by database normalization. Then the values appear over multiple rows, in a single column, which you can index in a more straightforward way.
Re your comment:
I came up with a solution for enforcing uniqueness on a JSON array, but it depends on the elements of the array staying in sorted order.
The JSON_UNQUOTE() function returns the JSON as a string.
Now we know how to make a generated column based on that, and then a UNIQUE KEY index on the generated column. This works in MySQL 5.7 and later.
Now trying to insert the same array of values in the JSON column fails:
Unfortunately, there's no good way to ensure that the JSON array is sorted. See Sorted json array field So it's up to you to design your application code so that it always pre-sorts the values in the JSON array before inserting or updating.
You can extract as many items as you want. They will be stored as scalars (e.g. string), rather than as compound values (which JSON is).
Let's create an index with one item only (first value from the JSON object):
Now you have an index on the VARCHAR column
tags_scalar
. The value contains quotes, which can also be skipped:As you can already imagine, the generated column can include more items from the JSON:
or use any other valid expression to auto-generate a string out of the JSON structure, in order to obtain something that can be easily indexed and searched like "tag1tag3tag5tag7".
As explained above, you don't need to know - NULL values can be skipped by using any valid expression. But of course it's always better to know.
Now there's the architecture decision: Is JSON data type the most appropriate to achieve the goal? To solve this particular problem? Is JSON the right tool here? Is it going to speed up searching?
If you insist, cast string:
This way or another you end up with a VARCHAR or TEXT column, where you apply the most applicable index structure (some options).
Further reading:
It's now possible with MySQL 8.0.17+
Something like this (not tested)
Use it this way:
More details and samples here: https://dev.mysql.com/doc/refman/8.0/en/json.html