order by slows query down massively

2020-08-17 02:36发布

问题:

using sql server 2014; ((SP1-CU3) (KB3094221) Oct 10 2015 x64

I have the following query

SELECT * FROM dbo.table1 t1

                    LEFT JOIN dbo.table2 t2 ON t2.trade_id = t1.tradeNo
                    LEFT JOIN dbo.table3 t3 ON t3.TradeReportID = t1.tradeNo                                                                                                
                    order by t1.tradeNo

there are ~70k, 35k and 73k rows in t1,t2 and t3 respectively.

When I omit the order by this query executes in 3 seconds with 73k rows.

As written the query took 8.5 minutes to return ~50k rows (I since stopped it)

Switching the order of the LEFT JOINs makes a difference:

SELECT * FROM dbo.table1 t1

                    LEFT JOIN dbo.table3 t3 ON t3.TradeReportID = t1.tradeNo                                                                                                
                    LEFT JOIN dbo.table2 t2 ON t2.trade_id = t1.tradeNo                     
                    order by t1.tradeNo

This now runs in 3 seconds.

I dont have any indexes on the tables. Adding indexes t1.tradeNo and t2.trade_id and t3.TradeReportID has no effect. Running the query with only one left join (both scenarios) in combination with the order by is fast.

Its fine for me to swap the order of the LEFT JOINs but this doesnt go far to explaining why this happens and under what scenarios it may happen again

The estimated exectuion plan is: (slow)

(exclamation mark details)

VS

Switching the order of the left joins (fast):

which I note are markedly different but I cannot interpret these to explain the performance difference

UPDATE

It appears the addition of the order by clause results in the execution plan using the Table Spool (lazy spool) vs NOT using this in the fast query. If I turn off the table spool via DBCC RULEOFF ('BuildSpool'); this 'fixes' the speed but according to this post this isnt recommended and cannot do it per query anyway

UPDATE 2

One of the columns returned (table3.Text] has type varchar(max)) - If this is changed to nvarchar(512) then the original (slow) query is now fast - ie the execution plan now decides to not use the Table Spool - also note that even tho the type is varchar(max) the field values are NULL for every one of the rows. This is now fixable but I am none the wiser

UPDATE 3

Warnings in the execution plan stated

Type conversion in expression (CONVERT_IMPLICIT(nvarchar(50),[t2].[trade_id],0)) may affect "CardinalityEstimate" in query plan choice, ...

t1.tradeNo is nvarchar(21) - the other two are varchar(50) - after altering the latter two to the same as the first the problem disappears! (leaving varchar(max) as stated in UPDATE 2 unchanged)

Given this problem goes away when either UPDATE 2 or UPDATE 3 are rectified I would guess that its a combination of the query optimizer using a temp table (table spool) for a column that has an unbounded size - very interesting despite the nvarchar(max) column having no data.

回答1:

Your likely best fix is to make sure both sides of your joins have the same datatype. There's no need for one to be varchar and the other nvarchar.

This is a class of problems that comes up quite frequently in DBs. The database is assuming the wrong thing about the composition of the data it's about to deal with. The costs shown in your estimated execution plan are likely a long way from what you'd get in your actual plan. We all make mistakes and really it would be good if SQL Server learned from its own but currently it doesn't. It will estimate a 2 second return time despite being immediately proven wrong again and again. To be fair, I don't know of any DBMS which has machine-learning to do better.

Where your query is fast

The hardest part is done up front by sorting table3. That means it can do an efficient merge join which in turn means it has no reason to be lazy about spooling.

Where it's slow

Having an ID that refers to the same thing stored as two different types and data lengths shouldn't ever be necessary and will never be a good idea for an ID. In this case nvarchar in one place varchar in another. When that makes it fail to get a cardinality estimate that's the key flaw and here's why:

The optimizer is hoping to require only a few unique rows from table3. Just a handful of options like "Male", "Female", "Other". That would be what is known as "low cardinality". So imagine tradeNo actually contained IDs for genders for some weird reason. Remember, it's you with your human skills of contextualisation, who knows that's very unlikely. The DB is blind to that. So here is what it expects to happen: As it executes the query the first time it encounters the ID for "Male" it will lazily fetch the data associated (like the word "Male") and put it in the spool. Next, because it's sorted it expects just a lot more males and to simply re-use what it has already put in the spool.

Basically, it plans to fetch the data from tables 1 and 2 in a few big chunks stopping once or twice to fetch new details from table 3. In practice the stopping isn't occasional. In fact, it may even be stopping for every single row because there are lots of different IDs here. The lazy spool is like going upstairs to get one small thing at a time. Good if you think you just need your wallet. Not so good if you're moving house, in which case you'll want a big box (the eager spool).

The likely reason that shrinking the size of the field in table3 helped is that it meant it estimated less of a comparative benefit in doing the lazy spool over a full sort up front. With varchar it doesn't know how much data there is, just how much there could potentially be. The bigger the potential chunks of data that need shuffling, the more physical work needs doing.

What you can do to avoid in future

Make your table schema and indexes reflect the real shape of the data.

  • If an ID can be varchar in one table then it's very unlikely to need the extra characters available in nvarchar for another table. Avoid the need for conversions on IDs and also use integers instead of characters where possible.
  • Ask yourself if any of these tables need tradeNo to be filled in for all rows. If so, make it not nullable on that table. Next, ask if the ID should be unique for any of these tables and set it up as such in the appropriate index. Unique is the definition of maximum cardinality so it won't make that mistake again.

Nudge in the right direction with join order.

  • The order you have your joins in the SQL is a signal to the database about how powerful/difficult you expect each join to be. (Sometimes as a human you know more. e.g. if querying for 50 year old astronauts you know that filtering for astronauts would be the first filter to apply but maybe begin with the age when searching for 50 year office workers.) The heavy stuff should come first. It will ignore you if it thinks it has the information to know better but in this case it's relying on your knowledge.

If all else fails

  • A possible fix would be to INCLUDE all the fields you'll need from table3 in the index on TradeReportId. The reason the indexes couldn't help so much already is that they make it easy to identify how to re-sort but it still hasn't been physically done. That is work it was hoping to optimize with a lazy spool but if the data were included it would be already available so no work to optimize.


回答2:

Having indexes on a table are key to speeding up retrieval of data. Start with this and then retry your query to see if the speed is improved using 'ORDER BY'