I'm trying to write the following in order to get a running total of distinct NumUsers, like so:
NumUsers = COUNT(DISTINCT [UserAccountKey]) OVER (PARTITION BY [Mth])
Management studio doesn't seem too happy about this. The error disappears when I remove the DISTINCT
keyword, but then it won't be a distinct count.
DISTINCT
does not appear to be possible within the partition functions.
How do I go about finding the distinct count? Do I use a more traditional method such as a correlated subquery?
Looking into this a bit further, maybe these OVER
functions work differently to Oracle in the way that they cannot be used in SQL-Server
to calculate running totals.
I've added a live example here on SQLfiddle where I attempt to use a partition function to calculate a running total.
There is a very simple solution using dense_rank()
dense_rank() over (partition by [Mth] order by [UserAccountKey])
+ dense_rank() over (partition by [Mth] order by [UserAccountKey] desc)
- 1
This will give you exactly what you were asking for: The number of distinct UserAccountKeys within each month.
I think the only way of doing this in SQL-Server 2008R2 is to use a correlated subquery, or an outer apply:
SELECT datekey,
COALESCE(RunningTotal, 0) AS RunningTotal,
COALESCE(RunningCount, 0) AS RunningCount,
COALESCE(RunningDistinctCount, 0) AS RunningDistinctCount
FROM document
OUTER APPLY
( SELECT SUM(Amount) AS RunningTotal,
COUNT(1) AS RunningCount,
COUNT(DISTINCT d2.dateKey) AS RunningDistinctCount
FROM Document d2
WHERE d2.DateKey <= document.DateKey
) rt;
This can be done in SQL-Server 2012 using the syntax you have suggested:
SELECT datekey,
SUM(Amount) OVER(ORDER BY DateKey) AS RunningTotal
FROM document
However, use of DISTINCT
is still not allowed, so if DISTINCT is required and/or if upgrading isn't an option then I think OUTER APPLY
is your best option
I use a solution that is similar to that of David above, but with an additional twist if some rows should be excluded from the count. This assumes that [UserAccountKey] is never null.
-- subtract an extra 1 if null was ranked within the partition,
-- which only happens if there were rows where [Include] <> 'Y'
dense_rank() over (
partition by [Mth]
order by case when [Include] = 'Y' then [UserAccountKey] else null end asc
)
+ dense_rank() over (
partition by [Mth]
order by case when [Include] = 'Y' then [UserAccountKey] else null end desc
)
- max(case when [Include] = 'Y' then 0 else 1 end) over (partition by [Mth])
- 1
An SQL Fiddle with an extended example can be found here.
Necromancing:
It's relativiely simple to emulate a COUNT DISTINCT over PARTITION BY with MAX via DENSE_RANK:
;WITH baseTable AS
(
SELECT 'RM1' AS RM, 'ADR1' AS ADR
UNION ALL SELECT 'RM1' AS RM, 'ADR1' AS ADR
UNION ALL SELECT 'RM2' AS RM, 'ADR1' AS ADR
UNION ALL SELECT 'RM2' AS RM, 'ADR2' AS ADR
UNION ALL SELECT 'RM2' AS RM, 'ADR2' AS ADR
UNION ALL SELECT 'RM2' AS RM, 'ADR3' AS ADR
UNION ALL SELECT 'RM3' AS RM, 'ADR1' AS ADR
UNION ALL SELECT 'RM2' AS RM, 'ADR1' AS ADR
UNION ALL SELECT 'RM3' AS RM, 'ADR1' AS ADR
UNION ALL SELECT 'RM3' AS RM, 'ADR2' AS ADR
)
,CTE AS
(
SELECT RM, ADR, DENSE_RANK() OVER(PARTITION BY RM ORDER BY ADR) AS dr
FROM baseTable
)
SELECT
RM
,ADR
,COUNT(CTE.ADR) OVER (PARTITION BY CTE.RM ORDER BY ADR) AS cnt1
,COUNT(CTE.ADR) OVER (PARTITION BY CTE.RM) AS cnt2
-- Not supported
--,COUNT(DISTINCT CTE.ADR) OVER (PARTITION BY CTE.RM ORDER BY CTE.ADR) AS cntDist
,MAX(CTE.dr) OVER (PARTITION BY CTE.RM ORDER BY CTE.RM) AS cntDistEmu
FROM CTE