Sql server which is better “Rank” or “Left join wi

2019-09-03 18:59发布

I have the following table in database with around 10 millions records (it will increase in future may be double in 1 year):

create table PropertyOwners (
    [Key] int not null primary key,
    PropertyKey int not null, 
    BoughtDate DateTime, 
    OwnerKey int not null
)
go

Above table contains all the property owned by an owner at certain time, I want to get the owners which owns more than certain amount of properties at a current time, lets say more then 1000 properties at a time. I have written two different queries one using "Rank" and other using "Left join with own table".

Using Rank (Taking around 4 sec):

select OwnerKey, COUNT(1) PropertyCount 
from (
    select PropertyKey, OwnerKey, BoughtDate,
        RANK() over (partition by PropertyKey order by BoughtDate desc) as [Rank]
    from dbo.PropertyOwners 
) result
where [Rank]=1
group by OwnerKey
having COUNT(1)>1000

Using left join with same table (Taking around 10sec):

select OwnerKey, COUNT(1) PropertyCount 
from (
    select po.PropertyKey, po.OwnerKey, po.BoughtDate
    from dbo.PropertyOwners po
    left join dbo.PropertyOwners lo on lo.PropertyKey = po.PropertyKey
    and lo.BoughtDate > po.BoughtDate
    where lo.PropertyKey is null
) result
group by OwnerKey
having COUNT(1)>1000

Both of the query times are unacceptable as taking so much time, can anyone help me with the query to rewrite. My table has following index:

CREATE NONCLUSTERED INDEX [IX_PropertyKey_BounghtDate] ON [dbo].[PropertyOwners] 
(
    [PropertyKey] ASC,
    [BoughtDate] DESC
)
INCLUDE ( [OwnerKey]) WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
GO

3条回答
再贱就再见
2楼-- · 2019-09-03 19:48

Grouping is never fast. You may look into indices that SQL Server will suggest if you run your queries sufficient amount of times; google diagnostic queries that involve sys.dm_db_index_usage_stats, they will be of some help.

Another option, already suggested before, is building a summary table. Slightly more lightweight solution will be an indexed view, but you have to understand implications that will come into play if you will create it.

查看更多
\"骚年 ilove
3楼-- · 2019-09-03 19:52

You have a fair amount of data and a lot that has to be counted. This analysis by Aaron Bertrand isn't exactly your problem, but it might help you.

With your supporting index, I would recommend trying the not exists approach:

select OwnerKey, count(*) as PropertyCount
from PropertyOwners po
where not exists (select 1
                  from PropertyOwners po2
                  where po2.PropertyKey = po.PropertyKey and
                        po2.BoughtDate > po.BoughtDate
                 )
group by OwnerKey
having count(*) > 1000;

If you can't get the query to work sufficiently fast, you may need to either upgrade your hardware or use triggers to keep a summary table up-to-date.

查看更多
Animai°情兽
4楼-- · 2019-09-03 19:59

You could rewrite this as (which might improve performance)

select OwnerKey, COUNT(1) PropertyCount 
from (
    select PropertyKey, MAX( BoughtDate) BoughtDate
    from dbo.PropertyOwners 
    Group by PropertyKey
) result INNER JOIN dbo.PropertyOwners po ON po.PropertyKey=result.PropertyKey and PO.boughtDate=result.boughtdate
group by OwnerKey
having COUNT(1)>1000
查看更多
登录 后发表回答