Lets imagine you have the following table called Table1 of Orders in chronological order returned from an In-line UDF. Please note that the OrderID may be out of sync so I have intentionally created an anomaly there (i.e. I have not included the Date field but I have access to the column if easier for you).
OrderID BuySell FilledSize ExecutionPrice RunningTotal AverageBookCost RealisedPnL
339 Buy 2 24.5 NULL NULL NULL
375 Sell 3 23.5 NULL NULL NULL
396 Sell 3 20.5 NULL NULL NULL
416 Sell 1 16.4 NULL NULL NULL
405 Buy 4 18.2 NULL NULL NULL
421 Sell 1 16.7 NULL NULL NULL
432 Buy 3 18.6 NULL NULL NULL
I have a function that I would like to apply recursively from the top to the bottom that will calculate the 3 NULL columns, however the imputs into the function will be the outputs from the previous call. The function I have created is called mfCalc_RunningTotalBookCostPnL and I have attached this below
CREATE FUNCTION [fMath].[mfCalc_RunningTotalBookCostPnL](
@BuySell VARCHAR(4),
@FilledSize DECIMAL(31,15),
@ExecutionPrice DECIMAL(31,15),
@OldRunningTotal DECIMAL(31,15),
@OldBookCost DECIMAL(31,15)
)
RETURNS @ReturnTable TABLE(
NewRunningTotal DECIMAL(31,15),
NewBookCost DECIMAL(31,15),
PreMultRealisedPnL DECIMAL(31,15)
)
AS
BEGIN
DECLARE @SignedFilledSize DECIMAL(31,15),
@NewRunningTotal DECIMAL(31,15),
@NewBookCost DECIMAL(31,15),
@PreMultRealisedPnL DECIMAL(31,15)
SET @SignedFilledSize = fMath.sfSignedSize(@BuySell, @FilledSize)
SET @NewRunningTotal = @OldRunningTotal + @SignedFilledSize
SET @PreMultRealisedPnL = 0
IF SIGN(@SignedFilledSize) = SIGN(@OldRunningTotal)
-- This Trade is adding to the existing position.
SET @NewBookCost = (@SignedFilledSize * @ExecutionPrice +
@OldRunningTotal * @OldBookCost) / (@NewRunningTotal)
ELSE
BEGIN
-- This trade is reversing the existing position.
-- This could be buying when short or selling when long.
DECLARE @AbsClosedSize DECIMAL(31,15)
SET @AbsClosedSize = fMath.sfMin(ABS(@SignedFilledSize), ABS(@OldRunningTotal));
-- There must be Crystalising of PnL.
SET @PreMultRealisedPnL = (@ExecutionPrice - @OldBookCost) * @AbsClosedSize * SIGN(-@SignedFilledSize)
-- Work out the NewBookCost
SET @NewBookCost = CASE
WHEN ABS(@SignedFilledSize) < ABS(@OldRunningTotal) THEN @OldBookCost
WHEN ABS(@SignedFilledSize) = ABS(@OldRunningTotal) THEN 0
WHEN ABS(@SignedFilledSize) > ABS(@OldRunningTotal) THEN @ExecutionPrice
END
END
-- Insert values into Return Table
INSERT INTO @ReturnTable
VALUES (@NewRunningTotal, @NewBookCost, @PreMultRealisedPnL)
-- Return
RETURN
END
So the t-SQL command I am looking for (I dont mind if someone can creates an Outer Apply too) would generate the following Result/Solution set:
OrderID BuySell FilledSize ExecutionPrice RunningTotal AverageBookCost RealisedPnL
339 Buy 2 24.5 2 24.5 0
375 Sell 3 23.5 -1 23.5 -2
396 Sell 3 20.5 -4 21.25 0
416 Sell 1 16.4 -5 20.28 0
405 Buy 4 18.2 -1 20.28 8.32
421 Sell 1 16.7 -2 18.49 0
432 Buy 3 18.6 1 18.6 -0.29
A few notes, the above stored procedure calls a trivial function fMath.sfSignedSize which just makes ('Sell',3) = -3. Also, for the avoidance of doubt, I would see the solution making these calls in this order assuming I am correct in my calculations! (Note that I start off assuming the OldRunningTotal and OldBookCost are both zero):
SELECT * FROM fMath.mfCalc_RunningTotalBookCostPnL('Buy',2,24.5,0,0)
SELECT * FROM fMath.mfCalc_RunningTotalBookCostPnL('Sell',3,23.5,2,24.5)
SELECT * FROM fMath.mfCalc_RunningTotalBookCostPnL('Sell',3,20.5,-1,23.5)
SELECT * FROM fMath.mfCalc_RunningTotalBookCostPnL('Sell',1,16.4,-4,21.25)
SELECT * FROM fMath.mfCalc_RunningTotalBookCostPnL('Buy',4,18.2,-5,20.28)
SELECT * FROM fMath.mfCalc_RunningTotalBookCostPnL('Sell',1,16.7,-1,20.28)
SELECT * FROM fMath.mfCalc_RunningTotalBookCostPnL('Buy',3,18.6,-2,18.49)
Obviously, the [fMath].[mfCalc_RunningTotalBookCostPnL] may need to be tweaked so that it can start off with NULL entries as the OldRunningTotal and OldBookCost but this is trivially done. The SQL Set theory of applying the resursive nature is a little harder.
Many thanks, Bertie.
Running total. UPDATE temp table vs CTE
Recursive solution takes 9 seconds:
UPDATE table takes 0 second:
Those two approaches could at least give you a framework to build your query upon.
BTW in SQL Server, unlike in MySQL, the order of variable assignment doesn't matter. This:
And the following:
They both execute the same way, i.e. the variable assignments happen first, regardless of variable assignment's position in the statement. Both queries have these same output:
On your exact table, just detect Buy/Sell, you can either multiply it by 1 and -1 respectively, or you merely sign the fields, e.g. :
If you happen to upgrade to SQL Server 2012, here's the straightforward implementation of running total:
On your exact problem:
UPDATE
If you feel uneasy with quirky update, you can put a guard clause to check if the order of to-be-updated rows matches the original order(aided by identity(1,1)):
If UPDATE really update rows in unpredictable order (or by any chance it will), the @RN_Check will not be equal to RN(identity order) anymore, the code will raise a divide-by-zero error then. Using guard clause, unpredictable update order will fail fast; if this happen then, it will be the time to file a
bugpetition to Microsoft to make the quirky update not so quirky :-)The guard clause hedge on the inherently imperative operation(variable assignment) is really sequential.
This is a bit of a stab in the dark without fully functioning [fMath].[mfCalc_RunningTotalBookCostPnL] to test with. My track record with getting recursive CTE's right the first time before testing is only about 50%, but even if not perfect it should be enough to get you started, if I understand your requirements correctly:
One more disclaimer - recursive CTEs are good for a max depth of 32767. If this is too restrictive, you'll need to explore either a different method, or some sort of windowing on the data set.
I remake the running total queries to include a partition (on Customer)
CTE approach:
Quirky update approach:
Cursor approach (code compacted to remove the scrollbars)
Metrics on 5,000 rows:
Those 0 seconds are not meaningful. After I bumped the rows to 50,000, here are the metrics:
Caveat, I found out that quirky update is really quirky, sometimes it works, sometime it don't(indicated by the presence of divide-by-zero error on one out five running of the query).
Here's the DDL for data:
UPDATE
Apparently, the pure CTE approach is not good. Must use a hybrid approach. When the row numbering is materialized to an actual table, the speed goes up
To recap, here are the metrics prior to converting the pure CTE to use materialized row numbering(row numbered results is in actual table, i.e. in temporary table)
After materializing the row numbering to temporary table:
Hybrid recursive CTE approach is actually faster than cursor approach.
Another UPDATE
Just by putting a clustered primary key on sequential column, the UPDATE update rows on its physical ordering. There's no more divide-by-zero(guard clause to detect non-sequential update) occurring. e.g.
I tried running the quirky update(with clustered primary key in place) 100 times if there could be corner cases, I found none so far. I haven't encounter any divide-by-zero error. Read the conclusion at the bottom of this blog post: http://www.ienablemuch.com/2012/05/recursive-cte-is-evil-and-cursor-is.html
And it's still fast even with clustered primary key in place.
Here's the metric for 100,000 rows:
Quirky update(which is not so quirky after all) is still fast. It's faster than hybrid recursive CTE.