Unordered results in SQL

2019-01-19 23:21发布

问题:

I have read it over and over again that SQL, at its heart, is an unordered model. That means executing the same SQL query multiple times can return result-set in different order, unless there's an "order by" clause included. Can someone explain why can a SQL query return result-set in different order in different instances of running the query? It may not be the case always, but its certainly possible.

Algorithmically speaking, does query plan not play any role in determining the order of result-set when there is no "order by" clause? I mean when there is query plan for some query, how does the algorithm not always return data in the same order?

Note: Am not questioning the use of order by, am asking why there is no-guarantee, as in, am trying to understand the challenges due to which there cannot be any guarantee.

回答1:

Some SQL Server examples where the exact same execution plan can return differently ordered results are

  1. An unordered index scan might be carried out in either allocation order or key order dependant on the isolation level in effect.
  2. The merry go round scanning feature allows scans to be shared between concurrent queries.
  3. Parallel plans are often non deterministic and order of results might depend on the degree of parallelism selected at runtime and concurrent workload on the server.
  4. If the plan has nested loops with unordered prefetch this allows the inner side of the join to proceed using data from whichever I/Os happened to complete first


回答2:

Martin Smith has some great examples, but the absolute dead simple way to demonstrate when SQL Server will change the plan used (and therefore the ordering that a query without ORDER BY will be used, based on the different plan) is to add a covering index. Take this simple example:

CREATE TABLE dbo.floob
(
  blat INT PRIMARY KEY, 
  x VARCHAR(32)
);

INSERT dbo.floob VALUES(1,'zzz'),(2,'aaa'),(3,'mmm');

This will order by the clustered PK:

SELECT x FROM dbo.floob;

Results:

x
----
zzz
aaa
mmm

Now, let's add an index that happens to cover the query above.

CREATE INDEX x ON dbo.floob(x);

The index causes a recompile of the above query when we run it again; now it orders by the new index, because that index provides a more efficient way for SQL Server to return the results to satisfy the query:

SELECT x FROM dbo.floob;

Results:

x
----
aaa
mmm
zzz

Take a look at the plans - neither has a sort operator, they are just - without any other ordering input - relying on the inherent order of the index, and they are scanning the whole index because they have to (and the cheapest way for SQL Server to scan the index is in order). (Of course even in these simple cases, some of the factors in Martin's answer could influence a different order; but this holds true in the absence of any of those factors.)

As others have stated, the ONLY WAY TO RELY ON ORDER is to SPECIFY AN ORDER BY. Please write that down somewhere. It doesn't matter how many scenarios exist where this belief can break; the fact that there is even one makes it futile to try to find some guidelines for when you can be lazy and not use an ORDER BY clause. Just use it, always, or be prepared for the data to not always come back in the same order.

Some related thoughts on this:

  • Bad habits to kick : relying on undocumented behavior
  • Why people think some SQL Server 2000 behaviors live on… 12 years later


回答3:

Quote from Wikipedia:

"As SQL is a declarative programming language, SELECT queries specify a result set, but do not specify how to calculate it. The database translates the query into a "query plan" which may vary between executions, database versions and database software. This functionality is called the "query optimizer" as it is responsible for finding the best possible execution plan for the query, within applicable constraints."

It all depends on what the query optimizer picks as a plan - table scan, index scan, index seek, etc.

Other factors that might influence picking a plan are table/index statistics and parameter sniffing to name a few.

In short, the order is never guaranteed without an ORDER BY clause.



回答4:

It's simple: if you need the data ordered then use an ORDER BY. It's not hard!

It may not cause you a problem today or next week or even next month but one day it will.

I've been on a project where we needed to rewrite dozens (or maybe hundreds) of queries after an upgrade to Oracle 10g caused GROUP BY to be evaluated in a different way than in had on Oracle 9i, meaning that the queries weren't necessarily ordered by the grouped columns anymore. Not fun and simple to avoid.

Remember that SQL is declarative language so you are telling the DBMS what you want and the DBMS is then working out how to get it. It will bring back the same results every time but may evaluate in a different way each time: there are no guarantees.

Just one simple example of where this might cause you problems is that new rows appear at the end of the table if you select from the table.... until they don't because you've deleted some rows and the DBMS decides to fill in the empty space.

There are an unknowable number of ways it can go wrong unless you use ORDER BY.

Why does water boil at 100 degrees C? Because that's the way it's defined.

Why are there no guarantees about result ordering without an ORDER BY? Because that's the way it's defined.

The DBMS will probably use the same query plan the next time and that query plan will probably return the data in the same order: but that is not a guarantee, not even close to a guarantee.



回答5:

If you don't specify an ORDER BY then the order will depend on the plan it uses, for example if the query did a table scan and used no index then the result would be the "natural order" or the order of the PK. However if the plan determines to use IndexA that is based on columnA then the order would be in that order. Make sense?