I have a table with customers, users and revenue similar to below (in reality thousands of records):
Customer User Revenue
001 James 500
002 James 750
003 James 450
004 Sarah 100
005 Sarah 500
006 Sarah 150
007 Sarah 600
008 James 150
009 James 100
What I want to do is to return only the highest spending customers that make up 80% of the total revenue for the user.
To do this manually I would order James' customers by their revenue, work out the percentage of total and a running total percentage, then only return records up to the point that the running total hits 80%:
Customer User Revenue % of total Running Total %
002 James 750 0.38 0.38
001 James 500 0.26 0.64
003 James 450 0.23 0.87 <- Greater than 80%, last record
008 James 150 0.08 0.95
009 James 100 0.05 1.00
I've tried using a CTE but so far have come up blank. Is there any way to do this via a single query rather than manually in an Excel sheet?
In SQL Server 2012+, you would use the cumulative sum -- much more efficient. In SQL Server 2008, you can do this using a correlated subquery or
cross apply
:Note: The
*1.0
is just in caseRevenue
is stored as an integer. SQL Server does integer division, which would return0
for both columns on almost all rows.EDIT:
Add
where user = 'James'
if you want results only for James.SQL Server 2012+
onlyYou could use windowed
SUM
:LiveDemo
SQL Server 2008:
LiveDemo2
Output:
EDIT 2:
LiveDemo3
Output:
SQL Server 2008
does not support everything inOVER()
clause, butROW_NUMBER
does.First cte just calculate position within a group:
Second cte:
c2
subquery calculate running total based on rank fromROW_NUMBER
c3
calculate full sum per userIn final query
s
subquery finds the lowestrunning
total that exceeds 80%.EDIT 3:
Using
ROW_NUMBER
is actually redundant.LiveDemo4