INSERT INTO SELECT strange order using UNION

2019-07-04 09:26发布

问题:

I had a typical non-normalized table (tempTable) with multiple numbered columns (rep1,rep2,...). So i wrote a script to insert the non-normalized data into a normalized table (myTable):

insert into myTable
select idRep,rep FROM
(
    select idRep, ISNULL(rep1,'') as rep FROM tempTable
    union
    select idRep, ISNULL(rep2,'') as rep FROM tempTable
    union
    select idRep, ISNULL(rep3,'') as rep FROM tempTable
    union
    select idRep, ISNULL(rep4,'') as rep FROM tempTable
    union
    select idRep, ISNULL(rep5,'') as rep FROM tempTable
) as t

Note: The table myTable also contains an auto-incremented IDENTITY column as its PRIMARY KEY.

The order rep1, rep2, rep3, rep4, rep5 is important in my scenario. Strangely, when I executed the script, the data wasn't inserted in the correct order such as the auto-generated id '1000' had the value from 'rep3' and the id '1001' had the value from 'rep1'.

Why is that? How was the script executed?

回答1:

The reason it is not going in the order you expect when using UNION is that union attempts to impose uniquness, so it is processing all of those rows together and bringing them out in the order most convenient for the engine.

If you switch to UNION ALL (which does not try to impose uniqueness) as Parado suggested it will not do the processing and they will go into the table in the order you put them in, almost all the time. This however is not gaurunteed and certain very unusual circumstances going on in other processes (especially ones that somehow touch on your tempTable) can affect it.

If you use an order by as Kash suggests then that will gauruntee the order of the ids (which can matter), but not technically the order that the rows get inserted (which very rarely matters in practice).

There is a good summary of some of this on MSDN.

So, that takes care of the why. As for the how to get what you actually want, I would use Kash's suggestion of adding a column to use with an order by clause, but I would use UNION ALL instead of UNION. Using UNION is like adding and implicit "distinct" requirement, which takes up processor cycles and makes the query plan more complicated.



回答2:

Your outer Select has no order, hence the INSERT is not ordered like it seems.

There are a few Ordering Guarantees in SQL Server and an INSERT of SELECT with ORDER BY guarantees computation of identity values as quoted:

INSERT queries that use SELECT with ORDER BY to populate rows guarantees how identity values are computed but not the order in which the rows are inserted

Change your SQL to make it ordered:

insert into myTable
select idRep,rep FROM
(
    select idRep, ISNULL(rep1,'') as rep, 1 as Grp FROM tempTable
    union
    select idRep, ISNULL(rep2,'') as rep, 2 as Grp FROM tempTable
    union
    select idRep, ISNULL(rep3,'') as rep, 3 as Grp FROM tempTable
    union
    select idRep, ISNULL(rep4,'') as rep, 4 as Grp FROM tempTable
    union
    select idRep, ISNULL(rep5,'') as rep, 5 as Grp FROM tempTable
) as t ORDER BY Grp


回答3:

Try with union all. It does't sort the data:

insert into myTable
select idRep,rep FROM
(
    select idRep, ISNULL(rep1,'') as rep FROM tempTable
    union all
    select idRep, ISNULL(rep2,'') as rep FROM tempTable
    union all
    select idRep, ISNULL(rep3,'') as rep FROM tempTable
    union all
    select idRep, ISNULL(rep4,'') as rep FROM tempTable
    union all
    select idRep, ISNULL(rep5,'') as rep FROM tempTable
) as t