delete activities where unt_uid is null
would be the fastest way but nobody can access the database / table until this statement has finished so this is a no-go.
I defined a cursor to get this task done during working time but anyway the impact to productivity is to big. So how to delete these record so that the normal use of this database is guaranteed?
It's a SQL-2005 Server on a 32-bit Win2003. Second Question is: How Long would you estimate for this job to be done (6 hours or 60 hours)? (Yes, i know that depends on the load but assume that this is a small-business environment)
Perhaps instead of deleting the records from your table, you could create a new identical table, insert the records you want to keep, and then rename the tables so the new one replaces the old one. This would still take some time, but the down-time on your site would be pretty minimal (just when swapping the tables)
I'd use the "nibbling delete" technique. From http://sqladvice.com/blogs/repeatableread/archive/2005/09/20/12795.aspx:
I've used it for exactly this type of scenario. The
WAITFOR
is important to keep, it allows other queries to do their work in between deletes.You can do it in chunks. For example, every 10 seconds execute:
Obviously define the row count (I arbitrarily picked 1000) and interval (I picked 10 seconds) which makes the most sense for your application.
Who can access the table will depend on your transaction isolation mode, I'd guess.
However, you're broadly right - lots of deletes is bad, particularly if your where clause means it cannot use an index - this means the database probably won't be able to lock only the rows it needs to delete, so it will end up taking a big lock on the whole table.
My best recommendation would be to redesign your application so you don't need to delete these rows, or possibly any rows.
You can either do this by partitioning the table such that you can simply drop partitions instead, or use the "copy the rows you want to keep then drop the table" recipe suggested by others.
In a small-business environment, it seems odd that you would need to delete 500,000 rows in standard operational behavior without affecting any other users. Typically for deletes that large, we're making a new table and using TRUNCATE/INSERT or sp_rename to overwrite the old one.
Having said that, in a special case, one of my monthly processes regularly can delete 200m rows in batches of around 3m at a time if it detects that it needs to re-run the process which generated those 200m rows. But this is a single-user process in a dedicated data warehouse database, and I wouldn't call it a small-business scenario.
I second the answers recommending seeking alternative approaches to your design.
i would create a task for this and schedule it to run during offpeak hours. But i would not suggest you to delete in the table being used. Move the rows you want to keep to new table and totally drop the current table with lots of rows you want to delete.