How can I optimize a query that returns a lot of r

2019-09-25 07:05发布

问题:

This question already has an answer here:

  • Optimizing a query returning a lot of records, a way to avoid hundreds of join. Is it a smart solution? 3 answers

I am not so into database and I have the following doubt about the possible optimization of this query (defined on a MySql database):

SELECT MCPS.id AS series_id,
        MD_CD.market_details_id AS market_id,
        MD_CD.commodity_details_id AS commodity_id,
        MD.market_name AS market_name,
        MCPS.price_date AS price_date,
        MCPS.avg_price AS avg_price,
        CU.ISO_4217_cod AS currency, 
        MU.unit_name AS measure_unit, 
        CD.commodity_name_en,
        CN.commodity_name 
FROM Market_Commodity_Price_Series AS MCPS
INNER JOIN MeasureUnit AS MU ON MCPS.measure_unit_id = MU.id
INNER JOIN Currency AS CU ON MCPS.currency_id = CU.id
INNER JOIN MarketDetails_CommodityDetails AS MD_CD ON MCPS.market_commodity_details_id = MD_CD.id
INNER JOIN MarketDetails AS MD ON MD_CD.market_details_id = MD.id
INNER JOIN CommodityDetails AS CD ON MD_CD.commodity_details_id = CD.id
INNER JOIN CommodityName AS CN ON CD.id = CN.commodity_details_id
INNER JOIN Languages AS LN ON CN.language_id  = LN.id
WHERE MD.id = 4
AND CD.id = 4 
AND LN.id=1
ORDER BY price_date DESC LIMIT 1

It contains a lot of join only because the information in the tables are very normalized.

This query returns the information related the last price of a commodity (AND CD.id = 4) in a specific market (**WHERE MD.id = 4).

So I am first retrieving the list of all prices of a specific commodity in a specific market (that can be a lot of records) and then I do:

ORDER BY price_date DESC LIMIT 1

I think that maybe this way can be computationally expensive because the first part of the query (the entire query except the last ORDER BY statment will retrieve a lot of records that then have to be ordered.

Is it a problem? Eventually what could be a smart strategy to optimize this query? (I have no idea...)

回答1:

Sanity check... are MD_CD.commodity_details_id and CN.commodity_details_id different? Or is it a typo?

Get rid of

INNER JOIN Languages AS LN ON CN.language_id  = LN.id
 ...
AND LN.id=1

Instead, simply have AND CN.language_id = 1

Run EXPLAIN SELECT ... whenever you try a formulation.

It seems that most things are centered around commodity_details_id = 4. So say AND MD_CD.commodity_details_id = 4, not MD.id = 4 and CD.id = 4.

Now, let's try to reformulate it for a little bit of efficiency. Let's start with the main filter -- AND MD_CD.commodity_details_id = 4, so let's say FROM MD_CD JOIN .... Then add JOINs until we can get to MCPS.price_dateso we can work onORDER BY price_date DESC LIMIT 1`:

SELECT very-few-columns
    FROM MD_CD
    JOIN ...  (the minimal set of tables)
    JOIN MCPS ON ...
    WHERE MD_CD.commodity_details_id = 4
    ORDER BY MCPS.price_date DESC LIMIT 1

Before going further, debug that above to make sure it is providing the correct one row.

To finish out the query, there are two approaches:

Plan A: Use that is a 'derived table' and add more JOINs on:

SELECT lots-of-stuff
    FROM ( the above query ) AS x
    JOIN the rest of the tables

Plan B: Use subqueries in the SELECT for the missing columns:

SELECT some stuff,
       ( SELECT commodity_name_en FROM CN
             WHERE commodity_details_id = CD.id ) AS commodity_name_en,
       etc
    FROM ... (as above)

(And heed my advice on many-to-many schema in http://mysql.rjweb.org/doc.php/index_cookbook_mysql#many_to_many_mapping_table )