SQL Batched Delete

2019-04-04 23:50发布

I have a table in SQL Server 2005 which has approx 4 billion rows in it. I need to delete approximately 2 billion of these rows. If I try and do it in a single transaction, the transaction log fills up and it fails. I don't have any extra space to make the transaction log bigger. I assume the best way forward is to batch up the delete statements (in batches of ~ 10,000?).

I can probably do this using a cursor, but is the a standard/easy/clever way of doing this?

P.S. This table does not have an identity column as a PK. The PK is made up of an integer foreign key and a date.

9条回答
相关推荐>>
2楼-- · 2019-04-05 00:01

Sounds like this is one-off operation (I hope for you) and you don't need to go back to a state that's halfway this batched delete - if that's the case why don't you just switch to SIMPLE transaction mode before running and then back to FULL when you're done?

This way the transaction log won't grow as much. This might not be ideal in most situations but I don't see anything wrong here (assuming as above you don't need to go back to a state that's in between your deletes).

you can do this in your script with smt like:

ALTER DATABASE myDB SET RECOVERY FULL/SIMPLE

Alternatively you can setup a job to shrink the transaction log every given interval of time - while your delete is running. This is kinda bad but I reckon it'd do the trick.

查看更多
成全新的幸福
3楼-- · 2019-04-05 00:03

I agree with the people who want you loop over a smaller set of records, this will be faster than trying to do the whole operation in one step. You may to experience withthe number of records you should include inthe loop. About 2000 at a time seems to be the sweet spot in most of the tables I do large deltes from althouhg a few need smaller amounts like 500. Depends on number of forign keys, size of the record, triggers etc, so it really will take some experimenting to find what you need. It also depends on how heavy the use of the table is. A heavily accessed table will need each iteration of the loop to run a shorter amount of time. If you can run during off hours, or best yet in single user mode, then you can have more records deleted in one loop.

If you don't think you do this in one night during off hours, it might be best to design the loop with a counter and only do a set number of iterations each night until it is done.

Further, if you use an implicit transaction rather than an explicit one, you can kill the loop query at any time and records already deleted will stay deleted except those in the current round of the loop. Much faster than trying to rollback half a million records becasue you've brought the system to a halt.

It is usually a good idea to backup a database immediately before undertaking an operation of this nature.

查看更多
我想做一个坏孩纸
4楼-- · 2019-04-05 00:04

What distinguishes the rows you want to delete from those you want to keep? Will this work for you:

while exists (select 1 from your_table where <your_condition>)
delete top(10000) from your_table
where <your_condition>
查看更多
狗以群分
5楼-- · 2019-04-05 00:06

I would do something similar to the temp table suggestions but I'd select into a new permanent table the rows you want to keep, drop the original table and then rename the new one. This should have a relatively low tran log impact. Obviously remember to recreate any indexes that are required on the new table after you've renamed it.

Just my two p'enneth.

查看更多
狗以群分
6楼-- · 2019-04-05 00:10

The short answer is, you can't delete 2 billion rows without incurring some kind of major database downtime.

Your best option may be to copy the data to a temp table and truncate the original table, but this will fill your tempDB and would use no less logging than deleting the data.

You will need to delete as many rows as you can until the transaction log fills up, then truncate it each time. The answer provided by Stanislav Kniazev could be modified to do this by increasing the batch size and adding a call to truncate the log file.

查看更多
来,给爷笑一个
7楼-- · 2019-04-05 00:17

You can 'nibble' the delete's which also means that you don't cause a massive load on the database. If your t-log backups run every 10 mins, then you should be ok to run this once or twice over the same interval. You can schedule it as a SQL Agent job

try something like this:

DECLARE @count int
SET @count = 10000

    DELETE  FROM table1 
    WHERE table1id IN (
        SELECT TOP (@count) tableid
        FROM table1
        WHERE x='y'
    )
查看更多
登录 后发表回答