I have the following table in database with around 10 millions records (it will increase in future may be double in 1 year):
create table PropertyOwners (
[Key] int not null primary key,
PropertyKey int not null,
BoughtDate DateTime,
OwnerKey int not null
)
go
Above table contains all the property owned by an owner at certain time, I want to get the owners which owns more than certain amount of properties at a current time, lets say more then 1000 properties at a time. I have written two different queries one using "Rank" and other using "Left join with own table".
Using Rank (Taking around 4 sec):
select OwnerKey, COUNT(1) PropertyCount
from (
select PropertyKey, OwnerKey, BoughtDate,
RANK() over (partition by PropertyKey order by BoughtDate desc) as [Rank]
from dbo.PropertyOwners
) result
where [Rank]=1
group by OwnerKey
having COUNT(1)>1000
Using left join with same table (Taking around 10sec):
select OwnerKey, COUNT(1) PropertyCount
from (
select po.PropertyKey, po.OwnerKey, po.BoughtDate
from dbo.PropertyOwners po
left join dbo.PropertyOwners lo on lo.PropertyKey = po.PropertyKey
and lo.BoughtDate > po.BoughtDate
where lo.PropertyKey is null
) result
group by OwnerKey
having COUNT(1)>1000
Both of the query times are unacceptable as taking so much time, can anyone help me with the query to rewrite. My table has following index:
CREATE NONCLUSTERED INDEX [IX_PropertyKey_BounghtDate] ON [dbo].[PropertyOwners]
(
[PropertyKey] ASC,
[BoughtDate] DESC
)
INCLUDE ( [OwnerKey]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
Grouping is never fast. You may look into indices that SQL Server will suggest if you run your queries sufficient amount of times; google diagnostic queries that involve
sys.dm_db_index_usage_stats
, they will be of some help.Another option, already suggested before, is building a summary table. Slightly more lightweight solution will be an indexed view, but you have to understand implications that will come into play if you will create it.
You have a fair amount of data and a lot that has to be counted. This analysis by Aaron Bertrand isn't exactly your problem, but it might help you.
With your supporting index, I would recommend trying the
not exists
approach:If you can't get the query to work sufficiently fast, you may need to either upgrade your hardware or use triggers to keep a summary table up-to-date.
You could rewrite this as (which might improve performance)