Azure SQL Database size growing out of control DBC

2019-09-15 18:26发布

问题:

Azure SQL database size in portal is 164GB. There are a lot of binary large objects passing through the database, those records are being deleted but the space is not getting reclaimed. DBCC SHRINKDATABASE doesn't help, it reports many more used pages than the sum of used_page_count from sys.dm_db_partition_stats.

DBCC SHRINKDATABASE results

DbId    FileId    CurrentSize    MinimumSize    UsedPages    EstimatedPages

5        1        19877520       2048           19877208     19877208

5        2        17024          128            17024        128

sum of used_page_count from sys.dm_db_partition_stats results: 8292675

This represents a difference of 11584533 pages or about 90GB that is not actually being used and cannot be reclaimed with DBCC SHRINKDATABASE. This difference between the database reported size and actual used page count size has been growing rapidly over the past few weeks and the database will soon hit the size limit of 250GB. What can I do to resolve this issue? Any help is much appreciated - thank you.

Update: per Microsoft support, a deployment to their SQL database servers in April broke the automated ghost record cleanup. A couple weeks ago, somebody was able to manually turn it back on for our server and the database size leveled out at 174GB but did not reclaim the other space consumed by ghost records. Microsoft support recommended scaling up to a Premium tier to minimize the effects of the following I/O intensive process:

declare @db_id int = db_id()
exec ('dbcc forceghostcleanup ('+ @db_id + ', 'visit_all_pages'')')

I scaled up to P15 assuming a quicker turnaround and less down time. Running the command results:

Msg 40518, Level 16, State 1, Line 1
DBCC command 'forceghostcleanup' is not supported in this version of SQL Server.

Unable to run the command, I attempted to scale back down to S3. The scale operation ran for 24 hours, reported that it had succeeded in the activity log, but the database was still P15. The next recommendation was to scale down in stages. I attempted to scale down to P6. The scale operation ran for 24 hours, reported that it had succeeded in the activity log, but the database is still P15. At this point, MS support is going back to product support and I'm waiting to hear back. I hope there's a refund in this somewhere.

回答1:

Defragmenting some indexes will very likely help.

You can use the following query to get the indexes which have the largest differences between the number of used and reserved pages:

select 
    table_name = object_schema_name(i.[object_id]) + '.' + object_name(i.[object_id]), 
    index_name = i.[name], partition_number, reserved_page_count, used_page_count
from 
    sys.dm_db_partition_stats ps
    inner join sys.indexes i
    on ps.[object_id] = i.[object_id] and ps.index_id = i.index_id
order by reserved_page_count - used_page_count desc

Rebuild the indexes from the top of the list one by one until.

Note that if you're running out of space on the entire database or the indexes are particularly large, rebuild may fail or take a very long time. In that case you should fall back to reorganizing.

More information about index defragmentation: https://docs.microsoft.com/en-us/sql/relational-databases/indexes/reorganize-and-rebuild-indexes



回答2:

The update I added explains this issue is a ghost record issue that will hopefully be resolved by Microsoft.