I used to write my EXISTS checks like this:
IF EXISTS (SELECT * FROM TABLE WHERE Columns=@Filters)
BEGIN
UPDATE TABLE SET ColumnsX=ValuesX WHERE Where Columns=@Filters
END
One of the DBA's in a previous life told me that when I do an EXISTS
clause, use SELECT 1
instead of SELECT *
IF EXISTS (SELECT 1 FROM TABLE WHERE Columns=@Filters)
BEGIN
UPDATE TABLE SET ColumnsX=ValuesX WHERE Columns=@Filters
END
Does this really make a difference?
There is no difference in SQL Server and it has never been a problem in SQL Server. The optimizer knows that they are the same. If you look at the execution plans, you will see that they are identical.
Not any real difference but there might be a very small performance hit. As a rule of thumb you should not ask for more data than you need.
Personally I find it very, very hard to believe that they don't optimize to the same query plan. But the only way to know in your particular situation is to test it. If you do, please report back!
Best way to know is to performance test both versions and check out the execution plan for both versions. Pick a table with lots of columns.
No, SQL Server is smart and knows it is being used for an EXISTS, and returns NO DATA to the system.
Quoth Microsoft: http://technet.microsoft.com/en-us/library/ms189259.aspx?ppud=4
To check yourself, try running the following:
If it was actually doing something with the SELECT list, it would throw a div by zero error. It doesn't.
EDIT: Note, the SQL Standard actually talks about this.
ANSI SQL 1992 Standard, pg 191 http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt
The reason for this misconception is presumably because of the belief that it will end up reading all columns. It is easy to see that this is not the case.
Gives plan
This shows that SQL Server was able to use the narrowest index available to check the result despite the fact that the index does not include all columns. The index access is under a semi join operator which means that it can stop scanning as soon as the first row is returned.
So it is clear the above belief is wrong.
However Conor Cunningham from the Query Optimiser team explains here that he typically uses
SELECT 1
in this case as it can make a minor performance difference in the compilation of the query.I tested four possible ways of expressing this query on an empty table with various numbers of columns.
SELECT 1
vsSELECT *
vsSELECT Primary_Key
vsSELECT Other_Not_Null_Column
.I ran the queries in a loop using
OPTION (RECOMPILE)
and measured the average number of executions per second. Results belowAs can be seen there is no consistent winner between
SELECT 1
andSELECT *
and the difference between the two approaches is negligible. TheSELECT Not Null col
andSELECT PK
do appear slightly faster though.All four of the queries degrade in performance as the number of columns in the table increases.
As the table is empty this relationship does seem only explicable by the amount of column metadata. For
COUNT(1)
it is easy to see that this gets rewritten toCOUNT(*)
at some point in the process from the below.Which gives the following plan
Attaching a debugger to the SQL Server process and randomly breaking whilst executing the below
I found that in the cases where the table has 1,024 columns most of the time the call stack looks like something like the below indicating that it is indeed spending a large proportion of the time loading column metadata even when
SELECT 1
is used (For the case where the table has 1 column randomly breaking didn't hit this bit of the call stack in 10 attempts)This manual profiling attempt is backed up by the VS 2012 code profiler which shows a very different selection of functions consuming the compilation time for the two cases (Top 15 Functions 1024 columns vs Top 15 Functions 1 column).
Both the
SELECT 1
andSELECT *
versions wind up checking column permissions and fail if the user is not granted access to all columns in the table.An example I cribbed from a conversation on the heap
So one might speculate that the minor apparent difference when using
SELECT some_not_null_col
is that it only winds up checking permissions on that specific column (though still loads the metadata for all). However this doesn't seem to fit with the facts as the percentage difference between the two approaches if anything gets smaller as the number of columns in the underlying table increases.In any event I won't be rushing out and changing all my queries to this form as the difference is very minor and only apparent during query compilation. Removing the
OPTION (RECOMPILE)
so that subsequent executions can use a cached plan gave the following.The test script I used can be found here