I have this SQL that works fine.
Want the my filter to return the LATEST unique SessionGuids with the highest UserSessionSequenceID.
Problem is performance sucks - even though I have good indexes.
How can I rewrite this - to omit the ROW_NUMBER line?
SELECT TOP(@resultCount) * FROM
(
SELECT
[UserSessionSequenceID]
,[SessionGuid]
,[IP]
,[Url]
,[UrlTitle]
,[SiteID]
,[BrowserWidth]
,[BrowserHeight]
,[Browser]
,[BrowserVersion]
,[Referer]
,[Timestamp]
,ROW_NUMBER() over (PARTITION BY [SessionGuid]
ORDER BY UserSessionSequenceID DESC) AS sort
FROM [tblSequence]
) AS t
WHERE ([Timestamp] > DATEADD(mi, -@minutes, GETDATE()))
AND (SiteID = @siteID)
AND sort = 1
ORDER BY [UserSessionSequenceID] DESC
Thanks a lot :-)
even though I have good indexes
No offense, but let us be the judge of that. Always post the exact schema for your tables, including all indexes and cardinalities, when asking SQL Server performance questions.
For example, lets consider the following table structure:
create table tblSequence (
[UserSessionSequenceID] int not null
,[SessionGuid] uniqueidentifier not null
,[SiteID] int not null
,[Timestamp] datetime not null
, filler varchar(512));
go
create clustered index cdxSequence on tblSequence (SiteID, [Timestamp]);
go
which is same as yours, but all fields not relevant to the performance problems are aggregated into a generic filler. Lets see, how bad is the performance on, say, 1M rows for about 50k sessions? Lets fill up the table with random data, but we'll simulate what ammounts to 'user activity':
set nocount on;
declare @i int = 0, @sc int = 1;
declare @SessionGuid uniqueidentifier = newid()
, @siteID int = 1
, @Timestamp datetime = dateadd(day, rand()*1000, '20070101')
, @UserSessionSequenceID int = 0;
begin tran;
while @i<1000000
begin
insert into tblSequence (
[UserSessionSequenceID]
,[SessionGuid]
,[SiteID]
,[Timestamp]
, filler)
values (
@UserSessionSequenceID
, @SessionGuid
, @siteID
, @timestamp
, replicate('X', rand()*512));
if rand()*100 < 5
begin
set @SessionGuid = newid();
set @siteID = rand() * 10;
set @Timestamp = dateadd(day, rand()*1000, '20070101');
set @UserSessionSequenceID = 0;
set @sc += 1;
end
else
begin
set @timestamp = dateadd(second, rand()*300, @timestamp);
set @UserSessionSequenceID += 1;
end
set @i += 1;
if (@i % 1000) = 0
begin
raiserror(N'Inserted %i rows, %i sessions', 0, 1, @i, @sc);
commit;
begin tran;
end
end
commit;
This takes about 1 minute to fill up. Now lets query the same query you asked: what is the last action of any user session on site X in the last Y minutes? I'll have to use a specific date for @now instead of GETDATE() becaus emy dtaa is simulated, not real, so I'm using whatever max timestamp was filled in randomly for SiteId 1:
set statistics time on;
set statistics io on;
declare @resultCount int = 30;
declare @minutes int = 60*24;
declare @siteID int = 1;
declare @now datetime = '2009-09-26 02:08:27.000';
SELECT TOP(@resultCount) * FROM
(
SELECT
[UserSessionSequenceID]
,[SessionGuid]
, SiteID
, Filler
,[Timestamp]
,ROW_NUMBER() over (PARTITION BY [SessionGuid]
ORDER BY UserSessionSequenceID DESC) AS sort
FROM [tblSequence]
where SiteID = @siteID
and [Timestamp] > DATEADD(mi, -@minutes, @now)
) AS t
WHERE sort = 1
ORDER BY [UserSessionSequenceID] DESC ;
This is same query as yours, but the restrictive filters are moved inside the ROW_NUMBER() part subquery. The results come back in:
Table 'tblSequence'. Scan count 1, logical reads 12, physical reads 0.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 31 ms.
31 ms response time on a warm cache, 12 pages read out of the nearly 60k pages of the table.
Updated
After reading again original query I realize my modified query is different. You only need new sessions. I still believe that the filtering out by SiteID and Timestmap is the only way to get the necessary performance, so the solution is to validate the candidate finds with a NOT EXISTS condition:
SELECT TOP(@resultCount) * FROM
(
SELECT
[UserSessionSequenceID]
,[SessionGuid]
, SiteID
, Filler
,[Timestamp]
,ROW_NUMBER() over (
PARTITION BY [SessionGuid]
ORDER BY UserSessionSequenceID DESC)
AS sort
FROM [tblSequence]
where SiteID = @siteID
and [Timestamp] > DATEADD(mi, -@minutes, @now)
) AS new
WHERE sort = 1
and not exists (
select SessionGuid
from tblSequence
where SiteID = @siteID
and SessionGuid = new.SessionGuid
and [TimeStamp] < DATEADD(mi, -@minutes, @now)
)
ORDER BY [UserSessionSequenceID] DESC
This returns on my laptop, for 1M rows over 400k sessions in 40 ms from a warm cache:
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0
Table 'tblSequence'. Scan count 2, logical reads 709, physical reads 0
SQL Server Execution Times:
CPU time = 16 ms, elapsed time = 40 ms.
Try these - should be equivalent query but you'll have to compare the query plans:
Using a JOIN
SELECT DISTINCT TOP(@resultCount)
s.usersessionsequenceid,
s.sessionguid,
s.ip,
s.url,
s.urltitle,
s.siteid,
s.browserwidth,
s.browserheight,
s.browser,
s.browserversion,
s.referer,
s.timestamp
FROM tblsequence s
JOIN (SELECT t.sessionquid,
MAX(t.timestamp) AS max_ts
FROM tblsequence t
GROUP BY t.sessionguid) x ON x.sessionguid = s.sessionguid
AND x.max_ts = s.timestamp
WHERE s.siteid = @SiteID
AND s.timestamp > DATEADD(mi, -@minutes, GETDATE())
ORDER BY s.usersessionsequenceid DESC
Using EXISTS
SELECT TOP(@resultCount)
s.usersessionsequenceid,
s.sessionguid,
s.ip,
s.url,
s.urltitle,
s.siteid,
s.browserwidth,
s.browserheight,
s.browser,
s.browserversion,
s.referer,
s.timestamp
FROM tblsequence s
WHERE s.siteid = @SiteID
AND s.timestamp > DATEADD(mi, -@minutes, GETDATE())
AND EXISTS(SELECT NULL
FROM tblsequence t
WHERE t.sessionguid = s.sessionguid
GROUP BY t.sessionguid
HAVING MAX(t.timestamp) = s.timestamp
ORDER BY s.usersessionsequenceid DESC
But if you want to get where the value is 2 or more, you'll have to stick with your ROW_NUMBER query.