Sql server which is better “Rank” or “Left join wi

2019-09-03 19:16发布

问题:

I have the following table in database with around 10 millions records (it will increase in future may be double in 1 year):

create table PropertyOwners (
    [Key] int not null primary key,
    PropertyKey int not null, 
    BoughtDate DateTime, 
    OwnerKey int not null
)
go

Above table contains all the property owned by an owner at certain time, I want to get the owners which owns more than certain amount of properties at a current time, lets say more then 1000 properties at a time. I have written two different queries one using "Rank" and other using "Left join with own table".

Using Rank (Taking around 4 sec):

select OwnerKey, COUNT(1) PropertyCount 
from (
    select PropertyKey, OwnerKey, BoughtDate,
        RANK() over (partition by PropertyKey order by BoughtDate desc) as [Rank]
    from dbo.PropertyOwners 
) result
where [Rank]=1
group by OwnerKey
having COUNT(1)>1000

Using left join with same table (Taking around 10sec):

select OwnerKey, COUNT(1) PropertyCount 
from (
    select po.PropertyKey, po.OwnerKey, po.BoughtDate
    from dbo.PropertyOwners po
    left join dbo.PropertyOwners lo on lo.PropertyKey = po.PropertyKey
    and lo.BoughtDate > po.BoughtDate
    where lo.PropertyKey is null
) result
group by OwnerKey
having COUNT(1)>1000

Both of the query times are unacceptable as taking so much time, can anyone help me with the query to rewrite. My table has following index:

CREATE NONCLUSTERED INDEX [IX_PropertyKey_BounghtDate] ON [dbo].[PropertyOwners] 
(
    [PropertyKey] ASC,
    [BoughtDate] DESC
)
INCLUDE ( [OwnerKey]) WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
GO

回答1:

You could rewrite this as (which might improve performance)

select OwnerKey, COUNT(1) PropertyCount 
from (
    select PropertyKey, MAX( BoughtDate) BoughtDate
    from dbo.PropertyOwners 
    Group by PropertyKey
) result INNER JOIN dbo.PropertyOwners po ON po.PropertyKey=result.PropertyKey and PO.boughtDate=result.boughtdate
group by OwnerKey
having COUNT(1)>1000


回答2:

You have a fair amount of data and a lot that has to be counted. This analysis by Aaron Bertrand isn't exactly your problem, but it might help you.

With your supporting index, I would recommend trying the not exists approach:

select OwnerKey, count(*) as PropertyCount
from PropertyOwners po
where not exists (select 1
                  from PropertyOwners po2
                  where po2.PropertyKey = po.PropertyKey and
                        po2.BoughtDate > po.BoughtDate
                 )
group by OwnerKey
having count(*) > 1000;

If you can't get the query to work sufficiently fast, you may need to either upgrade your hardware or use triggers to keep a summary table up-to-date.



回答3:

Grouping is never fast. You may look into indices that SQL Server will suggest if you run your queries sufficient amount of times; google diagnostic queries that involve sys.dm_db_index_usage_stats, they will be of some help.

Another option, already suggested before, is building a summary table. Slightly more lightweight solution will be an indexed view, but you have to understand implications that will come into play if you will create it.