Slow bulk insert for table with many indexes

2019-01-08 14:25发布

问题:

I try to insert millions of records into a table that has more than 20 indexes.

In the last run it took more than 4 hours per 100.000 rows, and the query was cancelled after 3½ days...

Do you have any suggestions about how to speed this up.

(I suspect the many indexes to be the cause. If you also think so, how can I automatically drop indexes before the operation, and then create the same indexes afterwards again?)

Extra info:

  • The space used by the indexes is about 4 times the space used by the data alone
  • The inserts are wrapped in a transaction per 100.000 rows.

Update on status:

The accepted answer helped me make it much faster.

回答1:

You can disable and enable the indexes. Note that disabling them can have unwanted side-effects (such as having duplicate primary keys or unique indices etc.) which will only be found when re-enabling the indexes.

--Disable Index
ALTER INDEX [IXYourIndex] ON YourTable DISABLE
GO

--Enable Index
ALTER INDEX [IXYourIndex] ON YourTable REBUILD
GO


回答2:

This sounds like a data warehouse operation. It would be normal to drop the indexes before the insert and rebuild them afterwards.

When you rebuild the indexes, build the clustered index first, and conversely drop it last. They should all have fillfactor 100%.

Code should be something like this

if object_id('Index') is not null drop table IndexList
select name into Index from dbo.sysindexes where id = object_id('Fact')

if exists (select name from Index where name = 'id1') drop index Fact.id1
if exists (select name from Index where name = 'id2') drop index Fact.id2        
if exists (select name from Index where name = 'id3') drop index Fact.id3
.
.
BIG INSERT

RECREATE THE INDEXES


回答3:

As noted by another answer disabling indexes will be a very good start.

4 hours per 100.000 rows [...] The inserts are wrapped in a transaction per 100.000 rows.

You should look at reducing the number, the server has to maintain a huge amount of state while in a transaction (so it can be rolled back), this (along with the indexes) means adding data is very hard work.

Why not wrap each insert statement in its own transaction?

Also look at the nature of the SQL you are using, are you adding one row per statement (and network roundtrip), or adding many?



回答4:

Disabling and then re-enabling indices is frequently suggested in those cases. I have my doubts about this approach though, because:

(1) The application's DB user needs schema alteration privileges, which it normally should not possess. (2) The chosen insert approach and/or index schema might be less then optimal in the first place, otherwise rebuilding complete index trees should not be faster then some decent batch-inserting (e.g. the client issuing one insert statement at a time, causing thousands of server-roundtrips; or a poor choice on the clustered index, leading to constant index node splits).

That's why my suggestions look a little bit different:

  • Increase ADO.NET BatchSize
  • Choose the target table's clustered index wisely, so that inserts won't lead to clustered index node splits. Usually an identity column is a good choice
  • Let the client insert into a temporary heap table first (heap tables don't have any clustered index); then, issue one big "insert-into-select" statement to push all that staging table data into the actual target table
  • Apply SqlBulkCopy
  • Decrease transaction logging by choosing bulk-logged recovery model

You might find more detailled information in this article.