UNION versus SELECT DISTINCT and UNION ALL Perform

2019-04-19 19:43发布

问题:

Is there any difference between these two performance-wise?

-- eliminate duplicates using UNION
SELECT col1,col2,col3 FROM Table1 
UNION SELECT col1,col2,col3 FROM Table2 
UNION SELECT col1,col2,col3 FROM Table3 
UNION SELECT col1,col2,col3 FROM Table4 
UNION SELECT col1,col2,col3 FROM Table5       
UNION SELECT col1,col2,col3 FROM Table6       
UNION SELECT col1,col2,col3 FROM Table7       
UNION SELECT col1,col2,col3 FROM Table8       

-- eliminate duplicates using DISTINCT    
SELECT DISTINCT * FROM
(     
    SELECT col1,col2,col3 FROM Table1 
    UNION ALL SELECT col1,col2,col3 FROM Table2 
    UNION ALL SELECT col1,col2,col3 FROM Table3 
    UNION ALL SELECT col1,col2,col3 FROM Table4 
    UNION ALL SELECT col1,col2,col3 FROM Table5       
    UNION ALL SELECT col1,col2,col3 FROM Table6       
    UNION ALL SELECT col1,col2,col3 FROM Table7       
    UNION ALL SELECT col1,col2,col3 FROM Table8       
) x   

回答1:

The difference between Union and Union all is that UNION ALL will not eliminate duplicate rows, instead it just pulls all rows from all tables fitting your query specifics and combines them into a table.

A UNION statement effectively does a SELECT DISTINCT on the results set.

If you select Distinct from Union All result set, Then the output will be equal to the Union result set.

Edit:

Performance on CPU cost:

Let me explain with Example:

I have two queries. one is Union another one is Union All

SET STATISTICS TIME ON
GO

select distinct * from (select * from dbo.user_LogTime
union all
select * from dbo.user_LogTime) X 
GO

SET STATISTICS TIME OFF

SET STATISTICS TIME ON
GO

select * from dbo.user_LogTime
union
select * from dbo.user_LogTime
GO

SET STATISTICS TIME OFF

I did run the both in same query window in SMSS. Lets see the Execution Plan in SMSS:

What happens is, The query with Union All and Distinct will take CPU cost more than Query with Union.

Performance on Time:

UNION ALL:

(1172 row(s) affected)

SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 39 ms.

UNION:

(1172 row(s) affected)

SQL Server Execution Times:
   CPU time = 10 ms,  elapsed time = 25 ms.

So Union is much better than the Union All with Distinct in performance-wise



回答2:

Another on-point example that illustrates the four possible cases:

/* with each case we should expect a return set:
(1) DISTINCT UNION {1,2,3,4,5} - is redundant with case (2)
(2) UNION {1,2,3,4,5} - more efficient?
(3) DISTINCT UNION ALL {1,2,2,3,3,4,4,5} 
(4) UNION ALL {1,1,2,2,2,3,3,4,4,5} 
*/

declare @t1 table (c1 varchar(15));
declare @t2 table (c2 varchar(15));

insert into @t1 values ('1'),('1'),('2'),('3'),('4');

insert into @t2 values ('2'),('2'),('3'),('4'),('5');


select DISTINCT * from @t1 --case (1)
UNION
select DISTINCT * from @t2 order by c1

select * from @t1 --case (2)    
UNION
select * from @t2 order by c1

select DISTINCT * from @t1 --case (3)
UNION ALL
select DISTINCT * from @t2 order by c1

select * from @t1 --case (4)   
UNION ALL
select * from @t2 order by c1