可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
Lets imagine you have the following table called Table1 of Orders in chronological order returned from an In-line UDF. Please note that the OrderID may be out of sync so I have intentionally created an anomaly there (i.e. I have not included the Date field but I have access to the column if easier for you).
OrderID BuySell FilledSize ExecutionPrice RunningTotal AverageBookCost RealisedPnL
339 Buy 2 24.5 NULL NULL NULL
375 Sell 3 23.5 NULL NULL NULL
396 Sell 3 20.5 NULL NULL NULL
416 Sell 1 16.4 NULL NULL NULL
405 Buy 4 18.2 NULL NULL NULL
421 Sell 1 16.7 NULL NULL NULL
432 Buy 3 18.6 NULL NULL NULL
I have a function that I would like to apply recursively from the top to the bottom that will calculate the 3 NULL columns, however the imputs into the function will be the outputs from the previous call. The function I have created is called mfCalc_RunningTotalBookCostPnL and I have attached this below
CREATE FUNCTION [fMath].[mfCalc_RunningTotalBookCostPnL](
@BuySell VARCHAR(4),
@FilledSize DECIMAL(31,15),
@ExecutionPrice DECIMAL(31,15),
@OldRunningTotal DECIMAL(31,15),
@OldBookCost DECIMAL(31,15)
)
RETURNS @ReturnTable TABLE(
NewRunningTotal DECIMAL(31,15),
NewBookCost DECIMAL(31,15),
PreMultRealisedPnL DECIMAL(31,15)
)
AS
BEGIN
DECLARE @SignedFilledSize DECIMAL(31,15),
@NewRunningTotal DECIMAL(31,15),
@NewBookCost DECIMAL(31,15),
@PreMultRealisedPnL DECIMAL(31,15)
SET @SignedFilledSize = fMath.sfSignedSize(@BuySell, @FilledSize)
SET @NewRunningTotal = @OldRunningTotal + @SignedFilledSize
SET @PreMultRealisedPnL = 0
IF SIGN(@SignedFilledSize) = SIGN(@OldRunningTotal)
-- This Trade is adding to the existing position.
SET @NewBookCost = (@SignedFilledSize * @ExecutionPrice +
@OldRunningTotal * @OldBookCost) / (@NewRunningTotal)
ELSE
BEGIN
-- This trade is reversing the existing position.
-- This could be buying when short or selling when long.
DECLARE @AbsClosedSize DECIMAL(31,15)
SET @AbsClosedSize = fMath.sfMin(ABS(@SignedFilledSize), ABS(@OldRunningTotal));
-- There must be Crystalising of PnL.
SET @PreMultRealisedPnL = (@ExecutionPrice - @OldBookCost) * @AbsClosedSize * SIGN(-@SignedFilledSize)
-- Work out the NewBookCost
SET @NewBookCost = CASE
WHEN ABS(@SignedFilledSize) < ABS(@OldRunningTotal) THEN @OldBookCost
WHEN ABS(@SignedFilledSize) = ABS(@OldRunningTotal) THEN 0
WHEN ABS(@SignedFilledSize) > ABS(@OldRunningTotal) THEN @ExecutionPrice
END
END
-- Insert values into Return Table
INSERT INTO @ReturnTable
VALUES (@NewRunningTotal, @NewBookCost, @PreMultRealisedPnL)
-- Return
RETURN
END
So the t-SQL command I am looking for (I dont mind if someone can creates an Outer Apply too) would generate the following Result/Solution set:
OrderID BuySell FilledSize ExecutionPrice RunningTotal AverageBookCost RealisedPnL
339 Buy 2 24.5 2 24.5 0
375 Sell 3 23.5 -1 23.5 -2
396 Sell 3 20.5 -4 21.25 0
416 Sell 1 16.4 -5 20.28 0
405 Buy 4 18.2 -1 20.28 8.32
421 Sell 1 16.7 -2 18.49 0
432 Buy 3 18.6 1 18.6 -0.29
A few notes, the above stored procedure calls a trivial function fMath.sfSignedSize which just makes ('Sell',3) = -3. Also, for the avoidance of doubt, I would see the solution making these calls in this order assuming I am correct in my calculations! (Note that I start off assuming the OldRunningTotal and OldBookCost are both zero):
SELECT * FROM fMath.mfCalc_RunningTotalBookCostPnL('Buy',2,24.5,0,0)
SELECT * FROM fMath.mfCalc_RunningTotalBookCostPnL('Sell',3,23.5,2,24.5)
SELECT * FROM fMath.mfCalc_RunningTotalBookCostPnL('Sell',3,20.5,-1,23.5)
SELECT * FROM fMath.mfCalc_RunningTotalBookCostPnL('Sell',1,16.4,-4,21.25)
SELECT * FROM fMath.mfCalc_RunningTotalBookCostPnL('Buy',4,18.2,-5,20.28)
SELECT * FROM fMath.mfCalc_RunningTotalBookCostPnL('Sell',1,16.7,-1,20.28)
SELECT * FROM fMath.mfCalc_RunningTotalBookCostPnL('Buy',3,18.6,-2,18.49)
Obviously, the [fMath].[mfCalc_RunningTotalBookCostPnL] may need to be tweaked so that it can start off with NULL entries as the OldRunningTotal and OldBookCost but this is trivially done. The SQL Set theory of applying the resursive nature is a little harder.
Many thanks,
Bertie.
回答1:
This is a bit of a stab in the dark without fully functioning [fMath].[mfCalc_RunningTotalBookCostPnL] to test with. My track record with getting recursive CTE's right the first time before testing is only about 50%, but even if not perfect it should be enough to get you started, if I understand your requirements correctly:
-- First, cache Table1 into #temp to improve recursive CTE performance
select
RowNum=ROW_NUMBER()OVER(ORDER BY OrderID)
, *
INTO #temp
FROM Table1;
GO
; WITH CTE (RowNum,OrderID, BuySell, FilledSize, ExecutionPrice, RunningTotal, AverageBookCost, RealisedPnL) AS (
SELECT RowNum,OrderID, BuySell, FilledSize, ExecutionPrice, RunningTotal=0, AverageBookCost=0, RealisedPnL=0
FROM #temp
WHERE RowNum=1
UNION ALL
SELECT t.RowNum, t.OrderID, t.BuySell, t.FilledSize, t.ExecutionPrice
, RunningTotal=c.NewRunningTotal, AverageBookCost=c.NewBookCost, RealisedPnL=c.PreMultRealisedPnL
FROM #temp t
INNER JOIN CTE ON CTE.RowNum+1 = t.RowNum
CROSS APPLY [fMath].[mfCalc_RunningTotalBookCostPnL](t.BuySell, t.FilledSize, t.ExecutionPrice, CTE.RunningTotal, CTE.AverageBookCost) AS c
)
SELECT OrderID, BuySell, FilledSize, ExecutionPrice, RunningTotal, AverageBookCost, RealisedPnL
FROM CTE
/* Replace the above SELECT with the following after testing ok
UPDATE tab
SET RunningTotal=CTE.RunningTotal
, AverageBookCost=CTE.AverageBookCost
, RealisedPnL=CTE.RealisedPnL
FROM Table1 tab
INNER JOIN CTE on CTE.OrderID=tab.OrderID
*/
OPTION (MAXRECURSION 32767);
GO
-- clean up
DROP TABLE #temp
GO
One more disclaimer - recursive CTEs are good for a max depth of 32767. If this is too restrictive, you'll need to explore either a different method, or some sort of windowing on the data set.
回答2:
Running total. UPDATE temp table vs CTE
create table Test(
OrderID int primary key,
Qty int not null
);
declare @i int = 1;
while @i <= 5000 begin
insert into Test(OrderID, Qty) values (@i * 2,rand() * 10);
set @i = @i + 1;
end;
Recursive solution takes 9 seconds:
with T AS
(
select ROW_NUMBER() over(order by OrderID) as rn, * from test
)
,R(Rn, OrderId, Qty, RunningTotal) as
(
select Rn, OrderID, Qty, Qty
from t
where rn = 1
union all
select t.Rn, t.OrderId, t.Qty, p.RunningTotal + t.Qty
from t t
join r p on t.rn = p.rn + 1
)
select R.OrderId, R.Qty, R.RunningTotal from r
option(maxrecursion 0);
UPDATE table takes 0 second:
create function TestRunningTotal()
returns @ReturnTable table(
OrderId int, Qty int, RunningTotal int
)
as begin
insert into @ReturnTable(OrderID, Qty, RunningTotal)
select OrderID, Qty, 0 from Test
order by OrderID;
declare @RunningTotal int = 0;
update @ReturnTable set
RunningTotal = @RunningTotal,
@RunningTotal = @RunningTotal + Qty;
return;
end;
Those two approaches could at least give you a framework to build your query upon.
BTW in SQL Server, unlike in MySQL, the order of variable assignment doesn't matter. This:
update @ReturnTable set
RunningTotal = @RunningTotal,
@RunningTotal = @RunningTotal + Qty;
And the following:
update @ReturnTable set
@RunningTotal = @RunningTotal + Qty,
RunningTotal = @RunningTotal;
They both execute the same way, i.e. the variable assignments happen first, regardless of variable assignment's position in the statement. Both queries have these same output:
OrderId Qty RunningTotal
----------- ----------- ------------
2 4 4
4 8 12
6 4 16
8 5 21
10 3 24
12 8 32
14 2 34
16 9 43
18 1 44
20 2 46
22 0 46
24 2 48
26 6 54
On your exact table, just detect Buy/Sell, you can either multiply it by 1 and -1 respectively, or you merely sign the fields, e.g. :
update @ReturnTable set
@RunningTotal = @RunningTotal +
CASE WHEN BuySell = 'Buy' THEN Qty ELSE -Qty END,
RunningTotal = @RunningTotal;
If you happen to upgrade to SQL Server 2012, here's the straightforward implementation of running total:
select OrderID, Qty, sum(Qty) over(order by OrderID) as RunningTotal
from Test
On your exact problem:
select OrderID, Qty,
sum(CASE WHEN BuySell = 'Buy' THEN Qty ELSE -Qty END)
over(order by OrderID) as RunningTotal
from Test;
UPDATE
If you feel uneasy with quirky update, you can put a guard clause to check if the order of to-be-updated rows matches the original order(aided by identity(1,1)):
create function TestRunningTotalGuarded()
returns @ReturnTable table(
OrderId int, Qty int,
RunningTotal int not null,
RN int identity(1,1) not null
)
as begin
insert into @ReturnTable(OrderID, Qty, RunningTotal)
select OrderID, Qty, 0 from Test
order by OrderID;
declare @RunningTotal int = 0;
declare @RN_check INT = 0;
update @ReturnTable set
@RN_check = @RN_check + 1,
@RunningTotal =
(case when RN = @RN_check then @RunningTotal + Qty else 1/0 end),
RunningTotal = @RunningTotal;
return;
end;
If UPDATE really update rows in unpredictable order (or by any chance it will), the @RN_Check will not be equal to RN(identity order) anymore, the code will raise a divide-by-zero error then. Using guard clause, unpredictable update order will fail fast; if this happen then, it will be the time to file a bug petition to Microsoft to make the quirky update not so quirky :-)
The guard clause hedge on the inherently imperative operation(variable assignment) is really sequential.
回答3:
I remake the running total queries to include a partition (on Customer)
CTE approach:
with T AS
(
select
ROW_NUMBER() over(partition by CustomerCode order by OrderID) as rn, *
from test
)
,R(CustomerCode, Rn, OrderId, Qty, RunningTotal) as
(
select CustomerCode, Rn, OrderID, Qty, Qty
from t
where rn = 1
union all
select t.CustomerCode, t.Rn, t.OrderId, t.Qty, p.RunningTotal + t.Qty
from t t
join r p on p.CustomerCode = t.CustomerCode and t.rn = p.rn + 1
)
select R.CustomerCode, R.OrderId, R.Qty, R.RunningTotal from r
order by R.CustomerCode, R.OrderId
option(maxrecursion 0);
Quirky update approach:
create function TestRunningTotalGuarded()
returns @ReturnTable table(
CustomerCode varchar(50), OrderId int, Qty int,
RunningTotal int not null, RN int identity(1,1) not null
)
as begin
insert into @ReturnTable(CustomerCode, OrderID, Qty, RunningTotal)
select CustomerCode, OrderID, Qty, 0 from Test
order by CustomerCode, OrderID;
declare @RunningTotal int;
declare @RN_check INT = 0;
declare @PrevCustomerCode varchar(50) = NULL;
update @ReturnTable set
@RN_check = @RN_check + 1,
@RunningTotal =
(case when RN = @RN_check then
case when @PrevCustomerCode = CustomerCode then
@RunningTotal + Qty
else
Qty
end
else
1/0
end),
@PrevCustomerCode = CustomerCode,
RunningTotal = @RunningTotal;
return;
end;
Cursor approach (code compacted to remove the scrollbars)
create function TestRunningTotalCursor()
returns @ReturnTable table(CustomerCode varchar(50), OrderId int,
Qty int, RunningTotal int not null) as
begin
declare @c_CustomerCode varchar(50);
declare @c_OrderID int;
declare @c_qty int;
declare @PrevCustomerCode varchar(50) = null;
declare @RunningTotal int = 0;
declare o_cur cursor for
select CustomerCode, OrderID, Qty from Test order by CustomerCode, OrderID;
open o_cur;
fetch next from o_cur into @c_CustomerCode, @c_OrderID, @c_Qty;
while @@FETCH_STATUS = 0 begin
if @c_CustomerCode = @PrevCustomerCode begin
set @RunningTotal = @RunningTotal + @c_qty;
end else begin
set @RunningTotal = @c_Qty;
end;
set @PrevCustomerCode = @c_CustomerCode;
insert into @ReturnTable(CustomerCode, OrderId, Qty, RunningTotal)
values(@c_CustomerCode, @c_OrderID, @c_Qty, @RunningTotal);
fetch next from o_cur into @c_CustomerCode, @c_OrderID, @c_Qty;
end;
close o_cur; deallocate o_cur; return;
end;
Metrics on 5,000 rows:
* Recursive CTE : 49 seconds
* Quirky Update : 0 second
* Cursor : 0 second
Those 0 seconds are not meaningful. After I bumped the rows to 50,000, here are the metrics:
* Quirky Update : 1 second
* Cursor : 3 second
* Recursive CTE : An hour
Caveat, I found out that quirky update is really quirky, sometimes it works, sometime it don't(indicated by the presence of divide-by-zero error on one out five running of the query).
Here's the DDL for data:
create table Test(
OrderID int primary key,
CustomerCode varchar(50),
Qty int not null
);
declare @i int = 1;
while @i <= 20 begin
insert into Test(OrderID, CustomerCode, Qty) values (
@i * 2
,case @i % 4
when 0 then 'JOHN'
when 1 then 'PAUL'
when 2 then 'GEORGE'
when 3 then 'RINGO'
end
,rand() * 10);
set @i = @i + 1;
end;
UPDATE
Apparently, the pure CTE approach is not good. Must use a hybrid approach. When the row numbering is materialized to an actual table, the speed goes up
select ROW_NUMBER() over(partition by CustomerCode order by OrderID) as rn, * into #xxx
from test;
with T AS
(
select * from #xxx
)
,R(CustomerCode, Rn, OrderId, Qty, RunningTotal) as
(
select CustomerCode, Rn, OrderID, Qty, Qty
from t
where rn = 1
union all
select t.CustomerCode, t.Rn, t.OrderId, t.Qty, p.RunningTotal + t.Qty
from t t
join r p on p.CustomerCode = t.CustomerCode and t.rn = p.rn + 1
)
select R.CustomerCode, R.OrderId, R.Qty, R.RunningTotal from r
order by R.CustomerCode, R.OrderId
option(maxrecursion 0);
drop table #xxx;
To recap, here are the metrics prior to converting the pure CTE to use materialized row numbering(row numbered results is in actual table, i.e. in temporary table)
* Quirky Update : 1 second
* Cursor : 3 second
* Recursive CTE(Pure) : An hour
After materializing the row numbering to temporary table:
* Quirky Update : 1 second
* Cursor : 3 second
* Recursive CTE(Hybrid) : 2 second (inclusive of row numbering table materialization)
Hybrid recursive CTE approach is actually faster than cursor approach.
Another UPDATE
Just by putting a clustered primary key on sequential column, the UPDATE update rows on its physical ordering. There's no more divide-by-zero(guard clause to detect non-sequential update) occurring. e.g.
alter function TestRunningTotalGuarded()
returns @ReturnTable table(
CustomerCode varchar(50), OrderId int, Qty int,
RunningTotal int not null,
RN int identity(1,1) not null primary key clustered
)
I tried running the quirky update(with clustered primary key in place) 100 times if there could be corner cases, I found none so far. I haven't encounter any divide-by-zero error. Read the conclusion at the bottom of this blog post: http://www.ienablemuch.com/2012/05/recursive-cte-is-evil-and-cursor-is.html
And it's still fast even with clustered primary key in place.
Here's the metric for 100,000 rows:
Quirky Update : 3 seconds
Hybrid Recursive CTE : 5 seconds
Cursor : 6 seconds
Quirky update(which is not so quirky after all) is still fast. It's faster than hybrid recursive CTE.