Query optimization — takes too long and stops the

2019-07-23 23:26发布

问题:

My query generates some reports about speeding, last time, and average speed. This is my query:

Select 
    r1 . *, r2.name, r2.notes, r2.serial
From
    (SELECT 
        k.idgps_unit,
            MIN(k.dt) AS DT_Start,
            MIN(CASE
                WHEN k.RowNumber = 1 THEN k.Lat
            END) AS Latitude_Start,
            MIN(CASE
                WHEN k.RowNumber = 1 THEN k.Long
            END) AS Longitude_Start,
            MIN(CASE
                WHEN k.RowNumber = 1 THEN k.Speed_kmh
            END) AS Speed_Start,
            MAX(k.dt) AS dt_end,
            MIN(CASE
                WHEN k.RowNumber = MaxRowNo THEN k.Lat
            END) AS Latitude_End,
            MIN(CASE
                WHEN k.RowNumber = MaxRowNo THEN k.Long
            END) AS Longitude_End,
            MIN(CASE
                WHEN k.RowNumber = MaxRowNo THEN k.Speed_kmh
            END) AS Speed_End,
            AVG(Speed_kmh) AS Average_Speed
    FROM
        (SELECT 
        gps_unit_location . *,
            @i:=CASE
                WHEN Speed_Kmh > 80 AND @b = 0 THEN @i + 1
                ELSE @i
            END AS IntervalID,
            @r:=CASE
                WHEN Speed_Kmh > 80 AND @b = 0 THEN 1
                ELSE @r + 1
            END AS RowNumber,
            @b:=CASE
                WHEN Speed_Kmh > 80 THEN 1
                ELSE 0
            END AS IntervalCheck
    FROM
        gps_unit_location, (SELECT @i:=0) i, (SELECT @r:=0) r, (SELECT @b:=0) b
    ORDER BY dt , idgps_unit_location) k
    INNER JOIN (SELECT 
        IntervalID, MAX(RowNumber) AS MaxRowNo
    FROM
        (SELECT 
        gps_unit_location . *,
            @i:=CASE
                WHEN Speed_Kmh > 80 AND @b = 0 THEN @i + 1
                ELSE @i
            END AS IntervalID,
            @r:=CASE
                WHEN Speed_Kmh > 80 AND @b = 0 THEN 1
                ELSE @r + 1
            END AS RowNumber,
            @b:=CASE
                WHEN Speed_Kmh > 80 THEN 1
                ELSE 0
            END AS IntervalCheck
    FROM
        gps_unit_location, (SELECT @i:=0) i, (SELECT @r:=0) r, (SELECT @b:=0) b
    ORDER BY dt , idgps_unit_location) d
    WHERE
        IntervalCheck = 1
    GROUP BY IntervalID) MaxInt ON MaxInt.IntervalID = k.IntervalID
    WHERE
        k.IntervalCheck = 1
            and k.idgps_unit in (SELECT 
                idgps_unit
            FROM
                instafleet.gps_unit
            where
                id_customer = (select 
                        idcustomer
                    from
                        user
                    where
                        iduser = 14))
    GROUP BY k.IntervalID , k.idgps_unit) r1
        Inner join
    gps_unit r2 ON r1.idgps_unit = r2.idgps_unit

Currently it takes 3 minutes for 783,723 records. I am thinking that proper indexes might help; although after some trial and error, I can't figure it out. If you think you can help, and need some additional info - I will be happy you provide it to you.

Explain

Result

回答1:

Adding an index helps in many cases, but you have a subquery joining another subquery, no index on your current table can help you speed up. The only way you can use indexes here is to create temporary table.

So as Markus pointed you need to break your query into a couple of smaller ones which store their results in a temporary table. Than you can add indexes to them and hopefully speedup your query. Another good thing about breaking big query into couple of smaller ones is that you can better profile which part is the slower one and fix it.

You have also used one subquery two times which is bad for performance as the result was not cached.

Here is an example of how you could do this:

DROP TEMPORARY TABLE IF EXISTS tmp_k;
CREATE TEMPORARY TABLE tmp_k
    ENGINE=Memory
SELECT 
    gps_unit_location.*,
    @i:= IF(((Speed_Kmh > 80) AND (@b = 0)), @i + 1, @i) AS IntervalID,
    @r:= IF(((Speed_Kmh > 80) AND (@b = 0)), 1, @r + 1) AS RowNumber,
    @b:= IF((Speed_Kmh > 80), 1, 0) AS IntervalCheck
FROM
    gps_unit_location,
    (SELECT @i:=0) i, 
    (SELECT @r:=0) r, 
    (SELECT @b:=0) b
ORDER BY
    dt,
    idgps_unit_location;

ALTER TABLE tmp_k ADD INDEX (IntervalID);

DROP TEMPORARY TABLE IF EXISTS tmp_max;
CREATE TEMPORARY TABLE tmp_max
    ENGINE=Memory
SELECT 
    IntervalID, 
    MAX(RowNumber) AS MaxRowNo
FROM
    temp_k
WHERE
    IntervalCheck = 1
GROUP BY 
    IntervalID;

ALTER TABLE tmp_max ADD INDEX (IntervalID);

SELECT 
    k.idgps_unit,
    MIN(k.dt) AS DT_Start,
    MIN(IF(k.RowNumber = 1, k.Lat, NULL)) AS Latitude_Start,
    MIN(IF(k.RowNumber = 1, k.Long, NULL)) AS Longitude_Start,
    MIN(IF(k.RowNumber = 1, k.Speed_kmh, NULL) AS Speed_Start,
    MAX(k.dt) AS DT_End,
    MIN(IF(k.RowNumber = m.MaxRowNo, k.Lat, NULL)) AS Latitude_End
    MIN(IF(k.RowNumber = m.MaxRowNo, k.Long, NULL)) AS Longitude_End
    MIN(IF(k.RowNumber = m.MaxRowNo, k.Speed_kmh, NULL)) AS Speed_End,
    AVG(Speed_kmh) AS Average_Speed,
    gu.name,
    gu.notes,
    gu.serial
FROM
    tmp_k AS k
    INNER JOIN tmp_max AS m
        USING(IntervalID)
    INNER JOIN gps_unit AS gu
        USING(idgps_unit)
    INNER JOIN user AS u
    ON (gu.idcustomer = u.idcustomer)
WHERE
    (k.IntervalCheck = 1) 
     AND (u.iduser = 14)
GROUP BY 
    k.IntervalID, 
    k.idgps_unit;

DROP TEMPORARY TABLE tmp_k;
DROP TEMPORARY TABLE tmp_max;


回答2:

If the size in bytes of the nested query(s) exceeds the size of the buffer pool (check innodb_buffer_pool_size) your query will take an extremely long time due to i/o paging.

That said you can improve your performance with the following tips:

  • Select as little data as possible in the nested query
  • Increase the size of your buffer pool.


回答3:

My personal experience has shown that MySQL is rather bad at handling subqueries. The query optimizer of a database is a very intricate and delicious part of the database and commercial database vendors put much effort into it, so it is IMHO no wonder that MySQL performs rather poor when it comes to handling crazy SQL statements invented by even crazier developers ;-).

See here: http://dev.mysql.com/doc/refman/5.6/en/subquery-restrictions.html :

The optimizer is more mature for joins than for subqueries, so in many cases a statement that uses a subquery can be executed more efficiently if you rewrite it as a join.

If the official mysql docs from Oracle state something like "more mature", then you can rest assured that it is actually something akin to crap (no pun intended, but I already had my issues with MySQL and most of the larger statements that run flawlessly with a commercial database, would rather kill mysql).

So the task is: rewrite it using JOINs....