Huge gap in increment value of identity column

2020-05-07 02:59发布

I have created a table with an identity column. When I insert values in that table, Identity column shows huge gap of increment in between the values. Identity value jumps from 6 to 10001. This is the output ordered by Department id:

Output Screenshot Here

This is table I have created:

Create Table STG2.Department
(
    DepartmentID int GENERATED ALWAYS AS IDENTITY (START WITH 1 INCREMENT BY 1  Cycle),
    Name varchar(100),
    GroupName varchar(100)
)
PRIMARY INDEX (DepartmentID);

This is how I am inserting values into the Department table:

insert into STG2.Department (Name, GroupName)
    Select Distinct
        Department, GroupName 
    from DP_STG.HR;

What am I doing wrong?

2条回答
混吃等死
2楼-- · 2020-05-07 03:38

This is too long for a comment.

What am I doing wrong?

What you are doing wrong is worrying about gaps in the identity column. These are a natural part of using databases. The most typical reasons are due to delete or failed inserts. The only guarantee (I think) is that the numbers are increasing and not duplicated.

In your case, my guess is that Teradata is reserving a bunch of numbers for some good reasons -- for parallelism or some other efficiency (I know SQL Server does this). The gaps will cause no harm, and the order of the inserts should be pretty-well preserved.

Maintaining gapless identity columns is a huge overhead for a database, particularly a powerful, parallel database such as Teradata. In essence, it means that each insert has to complete all queries on the table, lock the table, find the maximum value, add one, and use that. The people who write databases know what a performance killer this is and have looser requirements for such columns.

查看更多
爷的心禁止访问
3楼-- · 2020-05-07 03:39

As already stated the gaps are due to each AMP (the logical Processing Unit of Teradata to have MPP) has each own range of IDs. So it's not wrong to have these gaps but it's by design.

If you rely on IDs without gaps (for any reason), you have to do by your own. Either before loading in your ETL process or after/during loading and define "ID = ROW_NUMBER() + MAX(ID)" (pseudo code).

查看更多
登录 后发表回答