mysql udf json_extract in where clause - how to im

2019-08-01 23:37发布

How can I efficiently search json data in a mysql database?

I installed the extract_json udf from labs.mysql.com and played around with a test table with 2.750.000 entries.

CREATE TABLE `testdb`.`JSON_TEST_TABLE` (
   `AUTO_ID` INT UNSIGNED NOT NULL AUTO_INCREMENT,
   `OP_ID` INT NULL,
   `JSON` LONGTEXT NULL,
PRIMARY KEY (`AUTO_ID`)) $$

An example JSON field would look like so:

{"ts": "2014-10-30 15:08:56 (9400.223725848107) ", "operation": "1846922"}

I found that putting json_extract into a select statement has virtually no performance impact. I.e. the following selects (almost) have the same performance:

SELECT * FROM JSON_TEST_TABLE where OP_ID=2000000 LIMIT 10;

SELECT OP_ID, json_extract(JSON, "ts") ts, json_extract(JSON, "operation") operation FROM JSON_TEST_TABLE where OP_ID=2000000 LIMIT 10; 

However, as soon as I put a json_extract expression into the where clause the execution time increases by a factor of 10 or more (I went from 2,5 to 30 secs):

SELECT OP_ID, json_extract(JSON, "ts") ts, json_extract(JSON, "operation") operation FROM JSON_TEST_TABLE where json_extract(JSON, "operation")=2000000 LIMIT 10;

At this point I am thinking that I need to extract all info that I want to search into separate columns at insert time, and that if I really have to search in the json data I need to first narrow down the number of rows to be searched by other criteria, but I would like to make sure that I am not missing anything obvious. E.g. can I somehow index the json fields? Or is my select statement inefficiently written?

3条回答
手持菜刀,她持情操
2楼-- · 2019-08-01 23:38

You can try this: http://www.percona.com/blog/2015/02/17/indexing-json-documents-for-efficient-mysql-queries-over-json-data/

Flexviews materialized views for MySQL are used to extract data from the JSON using JSON_EXTRACT into another table, which can be indexed.

查看更多
姐就是有狂的资本
3楼-- · 2019-08-01 23:44

In fact during the execution of

SELECT * FROM JSON_TEST_TABLE where OP_ID=2000000 LIMIT 10;

json_extract() will be executed at most 10 times.

During this one

SELECT OP_ID, json_extract(JSON, "ts") ts, json_extract(JSON, "operation") operation FROM JSON_TEST_TABLE where json_extract(JSON, "operation")=2000000 LIMIT 10;

json_extract() will be executed for each row and the result limited to 10 records, hence the speed loss. Indexing won't help either since the processing time is used up rather tby the external code than MySQL's. Imho, the best bet in this case would be an optimized UDF.

查看更多
我想做一个坏孩纸
4楼-- · 2019-08-01 23:44

I think if you do an EXPLAIN on your query, you will see that MySQL does a full table scan, simply because your query is on a term that is not indexed.

查看更多
登录 后发表回答