How-To delete 8,500,000 Records from one table on

2020-04-03 04:34发布

delete activities where unt_uid is null

would be the fastest way but nobody can access the database / table until this statement has finished so this is a no-go.

I defined a cursor to get this task done during working time but anyway the impact to productivity is to big. So how to delete these record so that the normal use of this database is guaranteed?

It's a SQL-2005 Server on a 32-bit Win2003. Second Question is: How Long would you estimate for this job to be done (6 hours or 60 hours)? (Yes, i know that depends on the load but assume that this is a small-business environment)

6条回答
Evening l夕情丶
2楼-- · 2020-04-03 04:54

Perhaps instead of deleting the records from your table, you could create a new identical table, insert the records you want to keep, and then rename the tables so the new one replaces the old one. This would still take some time, but the down-time on your site would be pretty minimal (just when swapping the tables)

查看更多
甜甜的少女心
3楼-- · 2020-04-03 04:56

I'd use the "nibbling delete" technique. From http://sqladvice.com/blogs/repeatableread/archive/2005/09/20/12795.aspx:

DECLARE @target int
SET @target = 2000
DECLARE @count int
SET @count = 2000

WHILE @count = 2000 BEGIN

 DELETE FROM myBigTable
 WHERE targetID IN
 (SELECT TOP (@target) targetID
  FROM myBigTable WITH(NOLOCK)
  WHERE something = somethingElse) 

 SELECT @count = @@ROWCOUNT
 WAITFOR DELAY '000:00:00.200'

END

I've used it for exactly this type of scenario. The WAITFOR is important to keep, it allows other queries to do their work in between deletes.

查看更多
甜甜的少女心
4楼-- · 2020-04-03 04:57

You can do it in chunks. For example, every 10 seconds execute:

delete from activities where activityid in 
  (select top 1000 activityid from activities where unt_uid is null)

Obviously define the row count (I arbitrarily picked 1000) and interval (I picked 10 seconds) which makes the most sense for your application.

查看更多
神经病院院长
5楼-- · 2020-04-03 04:58

Who can access the table will depend on your transaction isolation mode, I'd guess.

However, you're broadly right - lots of deletes is bad, particularly if your where clause means it cannot use an index - this means the database probably won't be able to lock only the rows it needs to delete, so it will end up taking a big lock on the whole table.

My best recommendation would be to redesign your application so you don't need to delete these rows, or possibly any rows.

You can either do this by partitioning the table such that you can simply drop partitions instead, or use the "copy the rows you want to keep then drop the table" recipe suggested by others.

查看更多
太酷不给撩
6楼-- · 2020-04-03 05:06

In a small-business environment, it seems odd that you would need to delete 500,000 rows in standard operational behavior without affecting any other users. Typically for deletes that large, we're making a new table and using TRUNCATE/INSERT or sp_rename to overwrite the old one.

Having said that, in a special case, one of my monthly processes regularly can delete 200m rows in batches of around 3m at a time if it detects that it needs to re-run the process which generated those 200m rows. But this is a single-user process in a dedicated data warehouse database, and I wouldn't call it a small-business scenario.

I second the answers recommending seeking alternative approaches to your design.

查看更多
Summer. ? 凉城
7楼-- · 2020-04-03 05:11

i would create a task for this and schedule it to run during offpeak hours. But i would not suggest you to delete in the table being used. Move the rows you want to keep to new table and totally drop the current table with lots of rows you want to delete.

查看更多
登录 后发表回答