I need help understanding the below code. I have never seen reset when used in Teradata. What does RESET WHEN do in Teradata? I understand the partition and order by part. I was also unsure why this wasn't partitioned by PARTITION BY A.ACCT_DIM_NB, A.DAY_TIME_DIM_NB ORDER BY A.TXN_POSTING_SEQ . Also, is ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW just using the whole partitioned window?
Removed
I was also unsure why this wasn't partitioned by PARTITION BY Y.ACCT_DIM_NB, Y.DAY_TIME_DIM_NB ORDER BY Y.DAY_TIME_DIM_NB, Y.TXN_POSTING_SEQ
Don't know, but this would return a different result (and Y.DAY_TIME_DIM_NB
is not needed in ORDER BY
because it's already partitioned by it)
Also, is ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW just using the whole partitioned window?
It's exactly the same as ROWS UNBOUNDED PRECEDING
, i.e. a syntax variation for a Cumulative Max. The lpartition is ROWS UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
What does RESET WHEN do in Teradata?
The RESET WHEN
is a Teradata extension for dynamically adding partitions, it's a shorter syntax for two (in your case) or three nested OLAP functions:
-- using RESET WHEN
MAX(A.RUN_BAL_AM)
OVER (PARTITION BY A.ACCT_DIM_NB
ORDER BY A.DAY_TIME_DIM_NB, A.TXN_POSTING_SEQ
RESET WHEN A.CS_TXN_CD NOT IN ('072','075','079','107','111','112','139','181','318')
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS EOD_BAL_AM
-- Same result using Standard SQL
SELECT
Max(A.RUN_BAL_AM)
Over (PARTITION BY A.ACCT_DIM_NB, dynamic_partition
ORDER BY A.DAY_TIME_DIM_NB, A.TXN_POSTING_SEQ
ROWS BETWEEN Unbounded Preceding AND CURRENT ROW) AS EOD_BAL_AM
FROM
(
SELECT
-- this cumulative sum over 0/1 assigns a new value for each series of rows based on the CASE
Sum(CASE WHEN A.CS_TXN_CD NOT IN ('072','075','079','107','111','112','139','181','318') THEN 1 ELSE 0 end)
Over (PARTITION BY A.ACCT_DIM_NB, dynamic_partition
ORDER BY A.DAY_TIME_DIM_NB, A.TXN_POSTING_SEQ
ROWS Unbounded Preceding) AS dynamic_partition
FROM ...
) AS dt
What does RESET WHEN do in Teradata?
Resets the window accumulation, when the clause is true. Plenty of examples of this in action around the web but in your case I'd imagine(never seen it used with max) it effectively defines a point since which the max is calculated, and each time a txid not in the given list is encountered it causes the max to only be calculated from that point
I was also unsure why this wasn't partitioned by PARTITION BY Y.ACCT_DIM_NB, Y.DAY_TIME_DIM_NB ORDER BY Y.DAY_TIME_DIM_NB, Y.TXN_POSTING_SEQ .
Why do you think it should? Partition and order are very different. If you had a banking system you might partition by account but order transactions by date, if you're preparing a bank statement.
Also, is ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW just using the whole partitioned window?
It defines the segment of records the accumulator should look in to derive its answer. In your case the max is calculated only over previous rows. Unbounded preceding means (all rows since the start of the partition). Current row means just that. Other valid examples might be:
ROWS BETWEEN 200 preceding and current row
ROWS BETWEEN 10 preceding and 20 following
ROWS BETWEEN current row and unbounded following
Because your window is defined as only previous rows, the max will stick at any given max value as row order increases until a new max occurs in the data. For example:
Data,max
3,3
2,3
1,3
4,4
1,4
3,4
1,4
5,5
4,5
2,5
4,5
9,9
5,9
As you proceed from top to bottom as soon as a larger max than the known max, is found on the current row it becomes the new max. Without the restriction of the previous rows only, if the entire dataset were maxed then the reported max would be 9 for every row