可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
Can anyone please advise what is the best to achieve the below:
Requirement: Drop 5 columns from VLT (nearly 400 gb) in size.
The moment we try do the same, we face space issues on PRODUCTION, timeout errors (thru SSMS)
We tried to insert in any temp table (by keeping identity off), but then we inserted all nearly billions rows of data and we tried to switch the identity on, we are facing timeout errors.
should we do these operations thru POWERSHELL would be better as compared to doing in SSMS
Limitation: Limited space on production, tempdb growing fast because of these operations.
Please advise, what could be the best approach to drop column from VLT.
Regards
回答1:
I would take one of the approaches already mentioned but with some key modifications. Assuming you are on SQL Server 2008, do the following:
Make a zero-length copy of your existing very large table with only the columns you want to keep:
select top 0 {{column subset}} into tbl_tableB from tableA
Be sure to also copy any indexes, constraints, etc. to the new table. Identity columns will be handled appropriately by the SELECT...INTO
statement.
Rename the original table; we will replace it with a view in the next step.
exec sys.sp_rename @objname = 'tableA', @newname = 'tbl_tableA'
Create a view using the original table name and UNION ALL
:
create view tableA
as
select {{column subset}} from tbl_tableA
union all
select {{column subset}} from tbl_tableB
This will maintain some level of compatibility with applications querying the data. INSERTs
, UPDATEs
, and DELETEs
will have to be handled via triggers on the view. The UNION ALL
will prevent pressure in tempdb since there will be no sorting (versus a straight UNION
), and we will never have more than one copy of a row in existence at a time.
Use a DELETE
combined with an OUTPUT
clause to delete data in batches from the original table and simultaneously insert it into the new table:
BEGIN TRAN
DELETE TOP (1000) /* or whatever batch size you want */
FROM
tbl_tableA
OUTPUT (
DELETED.{{column subset}} /* have to list each column here prefixed by DELETED. */
)
INTO
tbl_tableB (
{{column subset}} /* again list each column here */
)
/* Check for errors */
/* COMMIT or ROLLBACK */
/* rinse and repeat [n] times */
Once you're done with the DELETEs
/INSERTs
, drop the view, drop the original table, rename the new table:
drop view tableA
drop table tbl_tableA
exec sys.sp_rename @objname = 'tbl_tableB', @newname = 'tableA'
The overriding merit of this approach is that the DELETE
and INSERT
happen simultaneously in the same transaction, meaning the data will always be in a consistent state. You can increase the size of the batch by changing the TOP
clause, giving you more control over transaction log usage and blocking. I've tested this exact approach on tables with and without identity columns and it works great. On a very large table, it will take a while to run; could be several hours to several days but it will complete with the desired result.
回答2:
The ALTER TABLE ... DROP
itself is a metadata only operation, it will complete nearly instantaneous as long as it can obtain an exclusive lock on the table, which implies all queries using the table have to drain (complete). But dropping the column does not physically remove them, see SQL Server table columns under the hood.
The next step is to remove the physical columns, if necessary. I call out if necessary' because, depending on the column type, it may not worth the effort. For variable length columns you could reclaim the space by running DBCC CLEANTABLE
. But if you have dropped fixed size columns on a noncompressed table (no page or row compression) then the only way to reclaim the space is to rebuild the table (the heap or the clustered index). If the table is partitioned, you can try rebuilding offline one partition at a time (ALTER TABLE ... REBUILD PARTITION = N
). If not, your best shot is online rebuild, provided you have no MAX type columns (this restriction is lifted in SQL Server 2012). Online rebuild generates a lot of log (at least 1.5 times size of data) but the it commits internally so the log backup maintenance can reclaim the space and you won't end up with 600Gb of log growth. If online rebuild is not feasible and the table is not partitioned then I would first revisit the decision to clean the drop columns.
If the column cleanup is an absolute must and you have no online alternative, then you really have a world of pain ahead. Doing such a large operation requires days of preparation, testing, and is not at all trivial. You would have to create a new table with the desired structure, start transferring data into it in batches and set up some mechanism to track changes done to data already copied, which are then applied to the copy. When all the data is copied, and the changes that occurred since starting of the copy are applied, then you can switch the old and new table using sp_rename
. All in all you would be much better if you can use the online option off-the-shelf.
回答3:
I'd say a combination of another table and a batch job.
1 - Create a new table with your desired structure. Use the same clustered index key as your old table.
2 - Create a view to union the old and new tables, so you have continual access to both if needed. To limit issues on production, you can name this the same as your original table, and rename the table to _Old or something else. Only include the needed fields in the view, NOT the fields you are dropping, obviously.
3 - Inside a transaction:
- Insert a number of rows to the new table (say 1m at a time or something)
- Delete from the old table,
JOIN
ing on the new table
This has the advantages of low log growth (because you are batching), low database growth (since the number of extra rows never goes above whatever your batch size is), and it's incremental so you can stop if it gets too slow.
The BAD news is, you are deleting records so once you start you are basically committed to this process. You may also get tempdb pressure form the UNION
view depending on how much sorting needs to happen.
回答4:
I would perhaps think about creating a new partitioned table with the schema you want and inserting the data into switch tables and then switching those tables into the new table.
If you're not very familiar with partitioned tables and indexes I highly reccomend this excellent whitepaper by Kimberly Tripp.
When insterting your data into your switch tables, you can force minimal logging by doing the following:
- Your switch table has to be empty.
- Your DB should be in simple recovery mode
You need to use trace flag 610 like so:
DBCC TRACEON(610)
You need to use the tablock hint on your table like so:
INSERT newtable WITH (TABLOCK)
SELECT col1, col2, col3, col4
FROM oldtable
WHERE col1 BETWEEN min and max
The switch table needs to have a clustered index
Good luck. I hope this is of some use. I work with VLDBs in SQL Server and have found that partitioning is pretty invaluable when it comes to loading and moving data.
回答5:
Can't say I have experience with tables that size, but if it was me and looking for something to try, I would try BCP'ing the data (only the columns you want to keep) out to a O/S file, drop the table and then BCPing the data back into a new table with just the columns I needed. Of course this assumes you have the ability to take the server off-line while you do this maintenance (and that you have good backups before you start).