Group by and group concat , optimization mysql que

2019-04-27 18:38发布

问题:

my example is on MYSQL VERSION is 5.6.34-log

Problem summary the below query takes 40 seconds, ORDER_ITEM table

  • has 758423 records

    And PAYMENT table

  • has 177272 records

And submission_entry table

  • has 2165698 records

    as A Whole Table count.

DETAILS HERE: BELOW:

  • I Have This Query, Refer to [1]

  • I Have added SQL_NO_CACHE for testing repeated tests when re
    query.

  • I Have Optimized indexes Refer to [2], but no significant
    improvement.

  • Find Table Structures here [3]

  • Find explain plan used [4]

[1]

     SELECT SQL_NO_CACHE
          `payment`.`id`                                                                                    AS id,
          `order_item`.`order_id`                                                                           AS order_id,


          GROUP_CONCAT(DISTINCT (CASE WHEN submission_entry.text = '' OR submission_entry.text IS NULL
            THEN ' '
                                 ELSE submission_entry.text END) ORDER BY question.var DESC SEPARATOR 0x1D) AS buyer,


          event.name                                                                                        AS event,
          COUNT(DISTINCT CASE WHEN (`order_item`.status > 0 OR (
            `order_item`.status != -1 AND `order_item`.status >= -2 AND `payment`.payment_type_id != 8 AND
            payment.make_order_free = 1))
            THEN `order_item`.id
                         ELSE NULL END)                                                                     AS qty,
          payment.currency                                                                                  AS `currency`,
          (SELECT SUM(order_item.sub_total)
           FROM order_item
           WHERE payment_id =
                 payment.id)                                                                                AS sub_total,
          CASE WHEN payment.make_order_free = 1
            THEN ROUND(payment.total + COALESCE(refunds_total, 0), 2)
          ELSE ROUND(payment.total, 2) END                                                                  AS 'total',
          `payment_type`.`name`                                                                             AS payment_type,
          payment_status.name                                                                               AS status,
          `payment_status`.`id`                                                                             AS status_id,
          DATE_FORMAT(CONVERT_TZ(order_item.`created`, '+0:00', '-8:00'),
                      '%Y-%m-%d %H:%i')                                                                     AS 'created',
          `user`.`name`                                                                                     AS 'agent',
          event.id                                                                                          AS event_id,
          payment.checked,
          DATE_FORMAT(CONVERT_TZ(payment.checked_date, '+0:00', '-8:00'),
                      '%Y-%m-%d %H:%i')                                                                     AS checked_date,
          DATE_FORMAT(CONVERT_TZ(`payment`.`complete_date`, '+0:00', '-8:00'),
                      '%Y-%m-%d %H:%i')                                                                     AS `complete date`,
          `payment`.`delivery_status`                                                                       AS `delivered`
        FROM `order_item`
          INNER JOIN `payment`
            ON payment.id = `order_item`.`payment_id` AND (payment.status > 0.0 OR payment.status = -3.0)
          LEFT JOIN (SELECT
                       sum(`payment_refund`.total) AS `refunds_total`,
                       payment_refunds.payment_id  AS `payment_id`
                     FROM payment
                       INNER JOIN `payment_refunds` ON payment_refunds.payment_id = payment.id
                       INNER JOIN `payment` AS `payment_refund`
                         ON `payment_refund`.id = `payment_refunds`.payment_id_refund
                     GROUP BY `payment_refunds`.payment_id) AS `refunds` ON `refunds`.payment_id = payment.id
#           INNER JOIN event_date_product ON event_date_product.id = order_item.event_date_product_id
#           INNER JOIN event_date ON event_date.id = event_date_product.event_date_id
          INNER JOIN event ON event.id = order_item.event_id
          INNER JOIN payment_status ON payment_status.id = payment.status
          INNER JOIN payment_type ON payment_type.id = payment.payment_type_id
          LEFT JOIN user ON user.id = payment.completed_by
          LEFT JOIN submission_entry ON submission_entry.form_submission_id = `payment`.`form_submission_id`
          LEFT JOIN question ON question.id = submission_entry.question_id AND question.var IN ('name', 'email')
        WHERE 1 = '1' AND (order_item.status > 0.0 OR order_item.status = -2.0)
        GROUP BY `order_item`.`order_id`
        HAVING 1 = '1'
        ORDER BY `order_item`.`order_id` DESC
        LIMIT 10

[2]

 CREATE INDEX order_id
      ON order_item (order_id);

    CREATE INDEX payment_id
      ON order_item (payment_id);

    CREATE INDEX status
      ON order_item (status);

Second Table

CREATE INDEX payment_type_id
  ON payment (payment_type_id);

CREATE INDEX status
  ON payment (status);

[3]

CREATE TABLE order_item
(
  id                         INT AUTO_INCREMENT
    PRIMARY KEY,
  order_id                   INT                                 NOT NULL,
  form_submission_id         INT                                 NULL,
  status                     DOUBLE DEFAULT '0'                  NULL,
  payment_id                 INT DEFAULT '0'                     NULL
);

SECOND TABLE

CREATE TABLE payment
(
  id                 INT AUTO_INCREMENT,
  payment_type_id    INT                                 NOT NULL,
  status             DOUBLE                              NOT NULL,
  form_submission_id INT                                 NOT NULL,
  PRIMARY KEY (id, payment_type_id)
);

[4] Run the snippet to see the table of EXPLAIN in HTML format

<!DOCTYPE html>
<html>
<head>
  <title></title>
</head>
<body>
<table border="1" style="border-collapse:collapse">
<tr><th>id</th><th>select_type</th><th>table</th><th>type</th><th>possible_keys</th><th>key</th><th>key_len</th><th>ref</th><th>rows</th><th>Extra</th></tr>
<tr><td>1</td><td>PRIMARY</td><td>payment_status</td><td>range</td><td>PRIMARY</td><td>PRIMARY</td><td>8</td><td>NULL</td><td>4</td><td>Using where; Using temporary; Using filesort</td></tr>
<tr><td>1</td><td>PRIMARY</td><td>payment</td><td>ref</td><td>PRIMARY,payment_type_id,status</td><td>status</td><td>8</td><td>exp_live_18092017.payment_status.id</td><td>17357</td><td></td></tr>
<tr><td>1</td><td>PRIMARY</td><td>payment_type</td><td>eq_ref</td><td>PRIMARY</td><td>PRIMARY</td><td>4</td><td>exp_live_18092017.payment.payment_type_id</td><td>1</td><td></td></tr>
<tr><td>1</td><td>PRIMARY</td><td>user</td><td>eq_ref</td><td>PRIMARY</td><td>PRIMARY</td><td>4</td><td>exp_live_18092017.payment.completed_by</td><td>1</td><td></td></tr>
<tr><td>1</td><td>PRIMARY</td><td>submission_entry</td><td>ref</td><td>form_submission_id,idx_submission_entry_1</td><td>form_submission_id</td><td>4</td><td>exp_live_18092017.payment.form_submission_id</td><td>2</td><td></td></tr>
<tr><td>1</td><td>PRIMARY</td><td>question</td><td>eq_ref</td><td>PRIMARY,var</td><td>PRIMARY</td><td>4</td><td>exp_live_18092017.submission_entry.question_id</td><td>1</td><td>Using where</td></tr>
<tr><td>1</td><td>PRIMARY</td><td>order_item</td><td>ref</td><td>status,payment_id</td><td>payment_id</td><td>5</td><td>exp_live_18092017.payment.id</td><td>3</td><td>Using where</td></tr>
<tr><td>1</td><td>PRIMARY</td><td>event</td><td>eq_ref</td><td>PRIMARY</td><td>PRIMARY</td><td>4</td><td>exp_live_18092017.order_item.event_id</td><td>1</td><td></td></tr>
<tr><td>1</td><td>PRIMARY</td><td>&lt;derived3&gt;</td><td>ref</td><td>key0</td><td>key0</td><td>5</td><td>exp_live_18092017.payment.id</td><td>10</td><td>Using where</td></tr>
<tr><td>3</td><td>DERIVED</td><td>payment_refunds</td><td>index</td><td>payment_id,payment_id_refund</td><td>payment_id</td><td>4</td><td>NULL</td><td>1110</td><td></td></tr>
<tr><td>3</td><td>DERIVED</td><td>payment</td><td>ref</td><td>PRIMARY</td><td>PRIMARY</td><td>4</td><td>exp_live_18092017.payment_refunds.payment_id</td><td>1</td><td>Using index</td></tr>
<tr><td>3</td><td>DERIVED</td><td>payment_refund</td><td>ref</td><td>PRIMARY</td><td>PRIMARY</td><td>4</td><td>exp_live_18092017.payment_refunds.payment_id_refund</td><td>1</td><td></td></tr>
<tr><td>2</td><td>DEPENDENT SUBQUERY</td><td>order_item</td><td>ref</td><td>payment_id</td><td>payment_id</td><td>5</td><td>func</td><td>3</td><td></td></tr></table>
</body>
</html>

Expected Restul

It has to be instead of 40 seconds less than 5

IMPORTANT Updates

1) Reply to comment 1: there is no foreign key at all on those two tables.

UPDATE-1: On local the original query takes 40 seconds if i removed only the following it becomes 25 seconds saves 15 seconds

GROUP_CONCAT(DISTINCT (CASE WHEN submission_entry.text = '' OR submission_entry.text IS NULL
    THEN ' '
                         ELSE submission_entry.text END) ORDER BY question.var DESC SEPARATOR 0x1D) AS buyer

if I removed only its the same time around 40 seconds no save!

COUNT(DISTINCT CASE WHEN (`order_item`.status > 0 OR (
    `order_item`.status != -1 AND `order_item`.status >= -2 AND `payment`.payment_type_id != 8 AND
    payment.make_order_free = 1))
    THEN `order_item`.id
                 ELSE NULL END)                                                                     AS qty,

if I removed only it takes around 36 seconds saves 4 seconds

(SELECT SUM(order_item.sub_total)
   FROM order_item
   WHERE payment_id =
         payment.id)                                                                                AS sub_total,
  CASE WHEN payment.make_order_free = 1
    THEN ROUND(payment.total + COALESCE(refunds_total, 0), 2)
  ELSE ROUND(payment.total, 2) END                                                                  AS 'total',

回答1:

Remove HAVING 1=1; the Optimizer may not be smart enough to ignore it. Please provide EXPLAIN SELECT (not in html) to see what the Optimizer is doing.

It seems wrong to have a composite PK in this case: PRIMARY KEY (id, payment_type_id). Please justify it.

Please explain the meaning of status or the need for DOUBLE: status DOUBLE

It will take some effort to figure out why the query is so slow. Let's start by tossing the normalization parts, such as dates and event name and currency. That is whittle down the query to enough to find the desired rows, but not the details on each row. If it is still slow, let's debug that. If it is 'fast', then add back on the other stuff, one by one, to find out what is causing a performance issue.

Is just id the PRIMARY KEY of each table? Or are there more exceptions (like payment)?

It seems 'wrong' to specify a value for question.var, but then use LEFT to imply that it is optional. Please change all LEFT JOINs to INNER JOINs unless I am mistaken on this issue.

Are any of the tables (perhaps submission_entry and event_date_product) "many-to-many" mapping tables? If so, then follow the tips here to get some performance gains.

When you come back please provide SHOW CREATE TABLE for each table.



回答2:

Guided by the strategies below,

  • pre-evaluating agregations onto temporary tables
  • placing payment at the top - since this seems to be the most deterministic
  • grouping joins - enforcing to the query optimizer the tables relationship

i present a revised version of your query:

-- -----------------------------------------------------------------------------
-- Summarization of order_item
-- -----------------------------------------------------------------------------

drop temporary table if exists _ord_itm_sub_tot;

create temporary table _ord_itm_sub_tot(
    primary key (payment_id)
)
SELECT
    payment_id,
    --
    COUNT(
        DISTINCT
            CASE
                WHEN(
                        `order_item`.status > 0 OR
                        (
                                `order_item`.status       != -1 AND
                                `order_item`.status       >= -2 AND
                                `payment`.payment_type_id != 8  AND
                                payment.make_order_free = 1
                            )
                    ) THEN `order_item`.id
                      ELSE NULL
            END
    ) AS qty,
    --
    SUM(order_item.sub_total) sub_total
FROM
    order_item
        inner join payment
        on payment.id = order_item.payment_id    
where order_item.status > 0.0 OR order_item.status = -2.0
group by payment_id;

-- -----------------------------------------------------------------------------
-- Summarization of payment_refunds
-- -----------------------------------------------------------------------------

drop temporary table if exists _pay_ref_tot;

create temporary table _pay_ref_tot(
    primary key(payment_id)
)
SELECT
    payment_refunds.payment_id  AS `payment_id`,
    sum(`payment_refund`.total) AS `refunds_total`
FROM
    `payment_refunds`
        INNER JOIN `payment` AS `payment_refund`
        ON `payment_refund`.id = `payment_refunds`.payment_id_refund
GROUP BY `payment_refunds`.payment_id;

-- -----------------------------------------------------------------------------
-- Summarization of submission_entry
-- -----------------------------------------------------------------------------

drop temporary table if exists _sub_ent;

create temporary table _sub_ent(
    primary key(form_submission_id)
)
select 
    submission_entry.form_submission_id,
    GROUP_CONCAT(
        DISTINCT (
            CASE WHEN coalesce(submission_entry.text, '') THEN ' '
                                                          ELSE submission_entry.text
            END
        )
        ORDER BY question.var
        DESC SEPARATOR 0x1D
    ) AS buyer
from 
    submission_entry
        LEFT JOIN question
        ON(
                question.id = submission_entry.question_id
            AND question.var IN ('name', 'email')
        )
group by submission_entry.form_submission_id;

-- -----------------------------------------------------------------------------
-- The result
-- -----------------------------------------------------------------------------

SELECT SQL_NO_CACHE
    `payment`.`id`          AS id,
    `order_item`.`order_id` AS order_id,
    --
    _sub_ent.buyer,
    --
    event.name AS event,
    --
    _ord_itm_sub_tot.qty,
    --
    payment.currency AS `currency`,
    --
    _ord_itm_sub_tot.sub_total,
    --
    CASE
        WHEN payment.make_order_free = 1 THEN ROUND(payment.total + COALESCE(refunds_total, 0), 2)
                                         ELSE ROUND(payment.total, 2)
    END AS 'total',
    --
    `payment_type`.`name`   AS payment_type,
    `payment_status`.`name` AS status,
    `payment_status`.`id`   AS status_id,
    --
    DATE_FORMAT(
        CONVERT_TZ(order_item.`created`, '+0:00', '-8:00'),
        '%Y-%m-%d %H:%i'
    ) AS 'created',
    --
    `user`.`name` AS 'agent',
    event.id      AS event_id,
    payment.checked,
    --
    DATE_FORMAT(CONVERT_TZ(payment.checked_date,  '+0:00', '-8:00'), '%Y-%m-%d %H:%i') AS checked_date,
    DATE_FORMAT(CONVERT_TZ(payment.complete_date, '+0:00', '-8:00'), '%Y-%m-%d %H:%i') AS `complete date`,
    --
    `payment`.`delivery_status` AS `delivered`
FROM
    `payment`
        INNER JOIN(
            `order_item`
                INNER JOIN event
                ON event.id = order_item.event_id
        )
        ON `order_item`.`payment_id` = payment.id
        --
        inner join _ord_itm_sub_tot
        on _ord_itm_sub_tot.payment_id = payment.id
        --
        LEFT JOIN _pay_ref_tot
        on _pay_ref_tot.payment_id = `payment`.id
        --
        INNER JOIN payment_status ON payment_status.id = payment.status
        INNER JOIN payment_type   ON payment_type.id   = payment.payment_type_id
        LEFT  JOIN user           ON user.id           = payment.completed_by
        --
        LEFT JOIN _sub_ent
        on _sub_ent.form_submission_id = `payment`.`form_submission_id`
WHERE
    1 = 1
AND (payment.status > 0.0 OR payment.status = -3.0)
AND (order_item.status > 0.0 OR order_item.status = -2.0)
ORDER BY `order_item`.`order_id` DESC
LIMIT 10

The query from your question present aggregated functions without explicit groupings... this is pretty awkward and in my solution I try to devise aggregations that 'make sense'.

Please, run this version and tell us your findings.

Be, please, very careful not just on the running statistics, but also on the summarization results.



回答3:

(The tables and query are too complex for me to do the transformation for you. But here are the steps.)

  1. Reformulate the query without any mention of refunds. That is, remove the derived table and the mention of it in the complex CASE.
  2. Debug and time the resulting query. Keep the GROUP BY order_item ORDER BY order_item DESC LIMIT 10 and do any other optimizations already suggested. In particular, get rid of HAVING 1=1 since it is in the way of a likely optimization.
  3. Make the query from step #2 be a 'derived table'...

Something like:

SELECT lots of stuff
    FROM ( query from step 2 ) AS step2
    LEFT JOIN ( ... ) AS refunds  ON step2... = refunds...
    ORDER BY step2.order_item DESC

The ORDER BY is repeated, but neither the GROUP BY, nor the LIMIT need be repeated.

Why? The principle here is...

Currently, it is going into the refunds correlated subquery thousands of times, only to toss it all but 10 times. The reformulation cuts that back to only 10 times.

(Caveat: I may have missed a subtlety preventing this reformulation from working as I presented it. If it does not work, see if you can make the 'principle' help you anyway.)



回答4:

Here is the minimum you should do each time you see a query with a lot of joins and pagination: you should select those 10 (LIMIT 10) ids that you group by from the first table (order_item) with as minimum joins as possible and then join the ids back to the first table and make all other joins. That way you will not move around in temporary tables all those thousands of columns and rows that you do not need to display.

  1. You look at the inner joins and WHERE conditions, GROUP BYs and ORDER BYs to see whether you need any other tables to filter out rows, group or order ids from the first table. In your case, it doesn't seem you need any joins, except for payment.

  2. Now you write the query to select those ids:

    SELECT o.order_id, o.payment_id
    FROM order_item o
    JOIN payment p
        ON p.id = o.payment_id AND (p.status > 0.0 OR p.status = -3.0)
    WHERE order_item.status > 0.0 OR order_item.status = -2.0
    ORDER BY order_id DESC
    LIMIT 10
    

    If there might be several payments for a single order, you should use GROUP BY order_id DESC instead of ORDER BY. To make the query work quicker you need a BTREE index on status column for order_item table, or even a composite index on (status, payment_id).

  3. Now, when you are sure that the ids are those that you expected, you make all other joins:

    SELECT order_item.order_id,
      `payment`.`id`,
      GROUP_CONCAT ... -- and so on from the original query
    FROM (
      SELECT o.order_id, o.payment_id
      FROM order_item o
      JOIN payment p
        ON p.id = o.payment_id AND (p.status > 0.0 OR p.status = -3.0)
      WHERE order_item.status > 0.0 OR order_item.status = -2.0
      ORDER BY order_id DESC
      LIMIT 10
    ) as ids
    JOIN order_item ON ids.order_id = order_item.order_id
    JOIN payment ON ids.payment_id = payment.id
    LEFT JOIN ( ... -- and so on
    

The idea is that you significantly lower the temporary tables you need to process. Now every row selected by the joins will be used in the result set.


UPD1: Another thing is that you should simplify the aggregation in LEFT JOIN:

SELECT
  sum(payment.total) AS `refunds_total`,
  refs.payment_id  AS `payment_id`
FROM payment_refunds refs
JOIN payment ON payment.id = refs.payment_id_refund
GROUP BY refs.payment_id

or even replace the LEFT JOIN with a correlated subquery, since the correlation will be executed only for those 10 rows (make sure, you use this whole query with three columns as the subquery, otherwise, the correlation will be computed for each row in the resulting join before the GROUP BY):

SELECT
      ids.order_id,
      ids.payment_id,
      (SELECT SUM(p.total) 
       FROM payment_refunds refs 
       JOIN payment p 
         ON refs.payment_id_refund = p.id
       WHERE refs.payment_id = ids.payment_id
       ) as refunds_total
    FROM (
      SELECT o.order_id, o.payment_id
      FROM order_item o
      JOIN payment p
        ON p.id = o.payment_id AND (p.status > 0.0 OR p.status = -3.0)
      WHERE order_item.status > 0.0 OR order_item.status = -2.0
      ORDER BY order_id DESC
      LIMIT 10
    ) as ids

You will also need to an index (payment_id, payment_id_refund) on payment_refunds and you can even try a covering index (payment_id, total) on payment as well.