MySQL SUM json values grouped by json keys

2019-02-10 18:21发布

问题:

Is it possible to calculate sum of json values grouped by json keys?

Mysql version is 5.7.17 on Google cloud sql.

Example_1: A short example of my point:

col1 | col2
-----|-----------------------
aaa  | {"key1": 1, "key2": 3}
-----|-----------------------
bbb  | {"key1": 0, "key2": 2}
-----|-----------------------
aaa  | {"key1": 50, "key2": 0}

SQL query should produce:

col1 | col2
-----|-----------------------
aaa  | {"key1": 51, "key2": 3}
-----|-----------------------
bbb  | {"key1": 0, "key2": 2}

OR

will it be possible with any of the below schema?

Example_2:

col1 | col2
-----|-----------------------
aaa  | {{"key_name" : "key1", "key_value" : 1}, {"key_name" : "key2", "key_value" : 3}}
-----|-----------------------
bbb  | {{"key_name" : "key1", "key_value" : 0}, {"key_name" : "key2", "key_value" : 2}}
-----|-----------------------
aaa  | {{"key_name" : "key1", "key_value" : 50}, {"key_name" : "key2", "key_value" : 0}}

Example_3:

col1 | col2
-----|-----------------------
aaa  | [{"key_name" : "key1", "key_value" : 1}, {"key_name" : "key2", "key_value" : 3}]
-----|-----------------------
bbb  | [{"key_name" : "key1", "key_value" : 0}, {"key_name" : "key2", "key_value" : 2}]
-----|-----------------------
aaa  | [{"key_name" : "key1", "key_value" : 50}, {"key_name" : "key2", "key_value" : 0}]

Example_4:

col1 | col2
-----|-----------------------
aaa  | {"key1": {"key_name" : "key1", "key_value" : 1}, "key2": {"key_name" : "key2", "key_value" : 3}}
-----|-----------------------
bbb  | {"key1": {"key_name" : "key1", "key_value" : 0}, "key2": {"key_name" : "key2", "key_value" : 2}}
-----|-----------------------
aaa  | {"key1": {"key_name" : "key1", "key_value" : 50}, "key2": {"key_name" : "key2", "key_value" : 0}}

回答1:

TL;DR: yes, it can be done without knowing the key names in advance, and none of the alternate data formats has any advantage over the original.

This can be done without knowing the key names in advance but it is painful... basically you have to look at every value in the table to determine the set of distinct keys in the table before you can sum them. Because of this requirement, and the fact that the alternate data formats can all have multiple keys per entry, there is no advantage to using any of them.

Since you have to look for all the distinct keys, it is as easy to do the sums while you are looking for them. This function and procedure together will do that. The function, json_merge_sum, takes two JSON values and merges them, summing the values where a key appears in both values e.g.

SELECT json_sum_merge('{"key1": 1, "key2": 3}', '{"key3": 1, "key2": 2}')

Output:

{"key1": 1, "key2": 5, "key3": 1}

The function code:

DELIMITER //
DROP FUNCTION IF EXISTS json_merge_sum //
CREATE FUNCTION json_sum_merge(IN j1 JSON, IN total JSON) RETURNS JSON
BEGIN
  DECLARE knum INT DEFAULT 0;
  DECLARE jkeys JSON DEFAULT JSON_KEYS(j1);
  DECLARE kpath VARCHAR(20);
  DECLARE v INT;
  DECLARE l INT DEFAULT JSON_LENGTH(jkeys);
  kloop: LOOP
    IF knum >= l THEN
      LEAVE kloop;
    END IF;
    SET kpath = CONCAT('$.', JSON_EXTRACT(jkeys, CONCAT('$[', knum, ']')));
    SET v = JSON_EXTRACT(j1, kpath);
    IF JSON_CONTAINS_PATH(total, 'one', kpath) THEN
      SET total = JSON_REPLACE(total, kpath, JSON_EXTRACT(total, kpath) + v);
    ELSE
      SET total = JSON_SET(total, kpath, v);
    END IF;
    SET knum = knum + 1;
  END LOOP kloop;
  RETURN total;
END

The procedure, count_keys, performs the equivalent of the GROUP BY clause. It finds all the distinct values of col1 in the table and then calls json_sum_merge for each row which has that value of col1. Note the row select query performs a SELECT ... INTO a dummy variable so no output is generated, and uses a MIN() to ensure there is only one result (so that it can be assigned to a variable).

The procedure:

DELIMITER //
DROP PROCEDURE IF EXISTS count_keys //
CREATE PROCEDURE count_keys()
BEGIN
  DECLARE finished INT DEFAULT 0;
  DECLARE col1val VARCHAR(20);
  DECLARE col1_cursor CURSOR FOR SELECT DISTINCT col1 FROM table2;
  DECLARE CONTINUE HANDLER FOR NOT FOUND SET finished=1;
  OPEN col1_cursor;
  col1_loop: LOOP
    FETCH col1_cursor INTO col1val;
    IF finished=1 THEN
      LEAVE col1_loop;
    END IF;
    SET @total = '{}';
    SET @query = CONCAT("SELECT MIN(@total:=json_sum_merge(col2, @total)) INTO @json FROM table2 WHERE col1='", col1val, "'");
    PREPARE stmt FROM @query;
    EXECUTE stmt;
    DEALLOCATE PREPARE stmt;
    SELECT col1val AS col1, @total AS col2;
  END LOOP col1_loop;
END

For a slightly larger example:

col1    col2    
aaa     {"key1": 1, "key2": 3}
bbb     {"key1": 4, "key2": 2}
aaa     {"key1": 50, "key3": 0}
ccc     {"key2": 5, "key3": 1, "key4": 3}
bbb     {"key1": 5, "key2": 1, "key5": 3}

CALL count_keys() produces:

col1    col2    
aaa     {"key1": 51, "key2": 3, "key3": 0}
bbb     {"key1": 9, "key2": 3, "key5": 3}
ccc     {"key2": 5, "key3": 1, "key4": 3}

Note I've called the table table2 in the procedure, you will need to edit that (in both queries) to suit.



回答2:

I believe that something like this could work.

SELECT SUM(col2->>"$.key1"), SUM(col2->>"$.key2") FROM your_table GROUP BY col1


回答3:

SQL for the "short example":

SELECT col1,
       JSON_OBJECT('key1', SUM(value1), 'key2', SUM(value2)) AS col2
FROM
    (SELECT col1,
            JSON_EXTRACT(col2, '$.key1') AS value1,
            JSON_EXTRACT(col2, '$.key2') AS value2
     FROM tbl) subq
GROUP BY col1;


回答4:

Solution for Example_3:

DROP TABLE IF EXISTS jsondata;
CREATE TABLE jsondata (json JSON, col varchar(11));

INSERT INTO jsondata VALUES
('[{"key_name" : "key1", "key_value" : 1}, {"key_name" : "key2", "key_value" : 3}]', 'aaa'),
('[{"key_name" : "key1", "key_value" : 0}, {"key_name" : "key3", "key_value" : 2}]', 'bbb'),
('[{"key_name" : "key1", "key_value" : 50}, {"key_name" : "key2", "key_value" : 0}]', 'aaa');

DROP FUNCTION IF EXISTS json_sum_by_col;
CREATE FUNCTION json_sum_by_col(col varchar(100)) RETURNS JSON
BEGIN
    DECLARE i INT DEFAULT 0;
    DECLARE done INT DEFAULT FALSE;
    DECLARE select_values JSON;
    DECLARE temp_result JSON;
    DECLARE json_result JSON DEFAULT '[]';
    DECLARE temp_key varchar(11);
    DECLARE temp_value int;

    DECLARE curs CURSOR FOR SELECT json FROM jsondata WHERE jsondata.col = col;
    DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;

    OPEN curs;
        read_loop: LOOP
            SET i = 0;
            FETCH curs INTO select_values;

            IF done THEN
              LEAVE read_loop;
            END IF;

            WHILE i < JSON_LENGTH(select_values) DO
                -- extract key and value for i element
                SET temp_key = JSON_EXTRACT(JSON_EXTRACT(select_values, CONCAT('$[',i,']')), '$.key_name');
                SET temp_value = JSON_EXTRACT(JSON_EXTRACT(select_values, CONCAT('$[',i,']')), '$.key_value');

                -- search json_result for key
                SET @search = JSON_SEARCH(json_result, 'one', JSON_UNQUOTE(temp_key));
                IF @search IS NOT NULL THEN
                    -- if exists add to existing value
                    SET @value_path = JSON_UNQUOTE(REPLACE(@search, 'name', 'value'));
                    SET temp_value = temp_value + JSON_EXTRACT(json_result, @value_path);
                    SET json_result = JSON_REPLACE(json_result, @value_path, temp_value);
                ELSE
                    -- else attach it to json_result
                    SET temp_result = JSON_OBJECT("key_name", JSON_UNQUOTE(temp_key), "key_value", temp_value);
                    SET json_result = JSON_INSERT(json_result, CONCAT('$[',JSON_LENGTH(json_result),']'), temp_result);
                END IF;

                SELECT i + 1 INTO i;
            END WHILE;
        END LOOP;
    CLOSE curs;

    RETURN json_result;
END;

SELECT col, json_sum_by_col(col) FROM jsondata GROUP BY col;

You can run it here