I am trying to select the first n rowid values from the following table variable that will get me as close to a sum(itemcount) of 200,000 without crossing that threshhold. If I was looking at this manually, I would just take the top 3 rows. I do not want to use a cursor unless there is no pure-set-based way.
What is a good set-based way to get all of the rowid values "sum while/until" I get to a running total of 200,000?
I looked at "running totals" at http://www.1keydata.com/sql/sql-running-totals.html but that did not seem like it would work out because the real table has around 500k rows.
Here is what I have tried so far:
declare @agestuff table ( rowid int primary key , itemcount int , itemage datetime )
insert into @agestuff values ( 1 , 175000 , '2013-01-24 17:21:40' )
insert into @agestuff values ( 2 , 300 , '2013-01-24 17:22:11' )
insert into @agestuff values ( 3 , 10000 , '2013-01-24 17:22:11' )
insert into @agestuff values ( 4 , 19000 , '2013-01-24 17:22:19' )
insert into @agestuff values ( 5 , 16000 , '2013-01-24 17:22:22' )
insert into @agestuff values ( 6 , 400 , '2013-01-24 17:23:06' )
insert into @agestuff values ( 7 , 25000 , '2013-01-24 17:23:06' )
select sum(itemcount) from @agestuff -- 245700 which is too many
select sum(itemcount) from @agestuff
where rowid in (1,2,3) -- 185300 which gets me as close as possible
Using SQL Server 2008. I'll switch to 2012 if necessary.
Windowing Functions - SQL Server 2012 only
DECLARE @point INT = 200000;
;WITH x(rowid, ic, r, s) AS
(
SELECT
rowid, itemcount, ROW_NUMBER() OVER (ORDER BY itemage, rowid),
SUM(itemcount) OVER (ORDER BY [itemage], rowid RANGE UNBOUNDED PRECEDING)
FROM @agestuff
)
SELECT x.rowid, x.ic, x.s
FROM x WHERE x.s <= @point
ORDER BY x.rowid;
Results:
rowid ic sum
----- ------ ------
1 175000 175000
2 300 175300
3 10000 185300
SQL fiddle demo
If you can't use SQL Server 2012 for some reason, then on SQL Server 2008 you can use a couple of alternatives:
Quirky Update
Note that this behavior is not documented, nor is it guaranteed to calculate your running totals in the correct order. So please use at your own risk.
DECLARE @st TABLE
(
rowid INT PRIMARY KEY,
itemcount INT,
s INT
);
DECLARE @RunningTotal INT = 0;
INSERT @st(rowid, itemcount, s)
SELECT rowid, itemcount, 0
FROM @agestuff
ORDER BY rowid;
UPDATE @st
SET @RunningTotal = s = @RunningTotal + itemcount
FROM @st;
SELECT rowid, itemcount, s
FROM @st
WHERE s < @point
ORDER BY rowid;
Cursor
DECLARE @st TABLE
(
rowid INT PRIMARY KEY, itemcount INT, s INT
);
DECLARE
@rowid INT, @itemcount INT, @RunningTotal INT = 0;
DECLARE c CURSOR LOCAL FAST_FORWARD
FOR SELECT rowid, itemcount
FROM @agestuff ORDER BY rowid;
OPEN c;
FETCH c INTO @rowid, @itemcount;
WHILE @@FETCH_STATUS = 0
BEGIN
SET @RunningTotal = @RunningTotal + @itemcount;
IF @RunningTotal > @point
BREAK;
INSERT @st(rowid, itemcount, s)
SELECT @rowid, @itemcount, @RunningTotal;
FETCH c INTO @rowid, @itemcount;
END
CLOSE c;
DEALLOCATE c;
SELECT rowid, itemcount, s
FROM @st
ORDER BY rowid;
I chose only two alternatives because others are even less desirable (mostly from a performance perspective). You can see them in the following blog post, with some background on how they perform and more information about potential gotchas. Don't paint yourself into a corner because you're stuck on the idea that cursors are bad - sometimes, like in this case, they can be the most efficient supported and reliable option:
http://www.sqlperformance.com/2012/07/t-sql-queries/running-totals