How to Improve Query Performance with many JOINs

2019-02-04 08:30发布

问题:

I have a query (with the purpose of making a view) which is using a few joins to get each column. Performance degrades quickly (exponentially?) for each set of joins added.

What would be a good approach to make this query faster? Please see comments within the query.

If it helps, this is using the WordPress DB schema.

Here is a screenshot of EXPLAIN

PRODUCTS TABLE

+--+----+
|id|name|
+--+----+
|1 |test|
+--+----+

METADATA TABLE

+----------+--------+-----+
|product_id|meta_key|value|
+----------+--------+-----+
|1         |price   |9.99 |
+----------+--------+-----+
|1         |sku     |ABC  |
+----------+--------+-----+

TERM_RELATIONSHIPS TABLE

+---------+----------------+
|object_id|term_taxonomy_id|
+---------+----------------+
|1        |1               |
+---------+----------------+
|1        |2               |
+---------+----------------+

TERM_TAXONOMY TABLE

+----------------+-------+--------+
|term_taxonomy_id|term_id|taxonomy|
+----------------+-------+--------+
|1               |1      |size    |
+----------------+-------+--------+
|2               |2      |stock   |
+----------------+-------+--------+

TERMS TABLE

+-------+-----+
|term_id|name |
+-------+-----+
|1      |500mg|
+-------+-----+
|2      |10   |
+-------+-----+

QUERY

SELECT 
  products.id,
  products.name,
  price.value AS price,
  sku.value AS sku,
  size.name AS size
FROM products

/* These joins are performing quickly */

INNER JOIN `metadata` AS price ON products.id = price.product_id AND price.meta_key = 'price'
INNER JOIN `metadata` AS sku ON products.id = sku.product_id AND sku.meta_key = 'sku'

/* Here's the part that is really slowing it down - I run this chunk about 5 times with different strings to match */

INNER JOIN `term_relationships` AS tr ON products.id = tr.object_id
  INNER JOIN `term_taxonomy` AS tt
  ON tr.term_taxonomy_id = tt.term_taxonomy_id AND tt.taxonomy = 'size'
    INNER JOIN `terms` AS size
    ON tt.term_id = size.term_id

回答1:

Your performance issue is most likely caused by the join with the 'term_taxonomy' table.
All other joins seems to use the primary key (where you probobly have working indexes on).

So my suggestion is to add a compound index on term_taxonomy_id and term_id (or if you must: taxonomy). Like this:

CREATE UNIQUE INDEX idx_term_taxonomy_id_taxonomy
ON term_taxonomy( term_taxonomy_id, taxonomy);

Hope this will help you.



回答2:

Make Sure all the columns on which there is "ON" conditional statements is there, should be indexed. This will significantly improve the speed.



回答3:

    Declare @query as NVARCHAR(MAX)
    set @query = ('SELECT 
    products.id,
    products.name,
    price.value AS price,
    sku.value AS sku,
    size.name AS size
    FROM products
    INNER JOIN metadata AS price ON products.id = price.product_id AND price.meta_key = price
    INNER JOIN metadata AS sku ON products.id = sku.product_id AND sku.meta_key = sku
    INNER JOIN term_relationships AS tr ON products.id = tr.object_id
    INNER JOIN term_taxonomy AS tt
    ON tr.term_taxonomy_id = tt.term_taxonomy_id AND tt.taxonomy = size
    INNER JOIN terms AS size
    ON tt.term_id = size.term_id
    into #t')

    exec(@query);
    select * from #t

I Hope the above way will reduce the time utilization, or creating a temporary table with all the fields you select and updating the temporary table by joining to the temporary table to all the other tables might also be effective, well i am not sure about it but Even I am waiting for your result as your question seems intresting



回答4:

Try this:

SELECT p.id, p.name, MAX(CASE m.meta_key WHEN 'price' THEN m.value ELSE '' END) AS price, 
       MAX(CASE m.meta_key WHEN 'sku' THEN m.value ELSE '' END) AS sku, s.name AS size
FROM products p 
INNER JOIN `metadata` AS m ON p.id = m.product_id  
INNER JOIN `term_relationships` AS tr ON p.id = tr.object_id 
INNER JOIN `term_taxonomy` AS tt ON tr.term_taxonomy_id = tt.term_taxonomy_id AND tt.taxonomy = 'size'
INNER JOIN `terms` AS s ON tt.term_id = s.term_id
GROUP BY p.id;

If you still find that your query is slow then add the EXPLAIN plan of my query so I can find which columns needs INDEX.



回答5:

The below script is formatted as per SQL Server rules - You can change this as per MySQL rules and give it a try -

SELECT 
  P.id,
  P.name,
  PIVOT_METADATA.price,
  PIVOT_METADATA.sku,
  size.name AS size
FROM products P (NOLOCK)

INNER JOIN term_relationships AS tr (NOLOCK)
    ON P.id = tr.object_id

INNER JOIN term_taxonomy AS tt (NOLOCK)
    ON tr.term_taxonomy_id = tt.term_taxonomy_id AND tt.taxonomy = 'size'

INNER JOIN terms AS size (NOLOCK)
    ON tt.term_id = size.term_id

INNER JOIN METADATA (NOLOCK)
    PIVOT
    (
        MAX(value)
        FOR [meta_key] IN (price,sku)
    )AS PIVOT_METADATA
    ON P.id = PIVOT_METADATA.product_id

What I feel could be the bottleneck in your query - You are joining Metadata 2 times. Since there are 1-to-many relationships in your tables, the Metadata 2-join doesn't hurt but after that as you join more tables - the number of rows due to 1-to-many relationship increase - and hence the prformance drops.

What I've tried to achieve - I'm making sure that as many 1-to-1 relationships are fulfilled as possible. To do this, I've done a Pivot on Metadata adn made price & sku as columns. Now my product id shall have only one row in Metadata pivot. alos, I've made sure that I join this picot at the very end.

Give it a try. Please share the expected performance, number of records you have & also what performance you get with my asnwer.



回答6:

METADATA_TABLE and TERM_RELATIONSHIP_TABLE do not have any proimary key. When there are huge records in these tables your query performancy will be hit.

Checkpoints to increase your performance.

  1. All tables should have primary key. This is because rows in table will be physically sorted.
  2. For small and queries involving few tables keeping primary key in table would be enough. If you still wish to improve performance, create non-clustered-index for columns such as *object_Id field of term_relationships table* . Non-clustered index should be created for those columns in table which are taking part in join operation.

However, point to be noted is that, non-clustered index should be very less on those tables where multiple insert and updates are happening. This is not a simple question and can’t be answered only based on run time. There are other factors that affect the answer especially if environment where a stored procedure is running is heavily transactional.

You can find more here



回答7:

I would suggest those:

  • Consider reducing those joins from business level;
  • If not possible to do from "top"(business level), and the data is not for real time, I would suggest to prepare a memory table(I know the solution is not ideal). And select your data from memory table directly.

In my experience:

  • "joins" is the killer for performance, the bigger your data is, the more pain you will feel;
  • Try to get rid of joins, not try to improve query performance by keeping joins unless you have to. Usually I will try to fix those issues from "top" to "bottom"
  • The last suggestion is if all above don't work. I will consider "map/reduce + fulltext search", if that worth to do.

(Forgive me I didn't provide solution to improve your query performance.)