Sessionize a column of numbers into groups of 30 o

2019-02-25 10:28发布

问题:

Consider a column that represents "time between events":

(5, 40, 3, 6, 0, 9, 0, 4, 5, 18, 2, 4, 3, 2)

I would like to group these into buckets of 30, but buckets that reset. Desired outcome:

(0, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2)

This is because, when we get to a cumulative 30, we "reset" and begin counting again. So, 5 + 40 > 30, we drop down to zero and begin cumulative adding until we reach 30...(3 + 6 + 0 ...), which happens at when we reach 10th element == 18.

This can be implemented via a Reduce function (see this answer) but I cannot figure out how to implement this in Teradata? It's like I need to be able to refer to the same OVER(PARTITION BY ... within the same call.


Breaking down the logic, here's an example in Excel:

Where, B2 has the formula: =IF(B1<30, B1+A2, A2) and is dragged down accordingly. Column C then checks if column B is >=30, and column D is a simple cumsum of column B.

回答1:

Only way I know of to do this in Teradata is to use a recursive CTE. Because I'm lazy, let's simplify this down to say you want to reset when your running sum is greater than 2. Creating and populating a really simple volatile table for this:

CREATE VOLATILE TABLE vt1
(
    foo VARCHAR(10)
    , counter INTEGER
    , bar INTEGER
)
ON COMMIT PRESERVE ROWS;

INSERT INTO vt1 VALUES ('a', 1, '1');
INSERT INTO vt1 VALUES ('a', 2, '2');
INSERT INTO vt1 VALUES ('a', 3, '2');
INSERT INTO vt1 VALUES ('a', 4, '4');
INSERT INTO vt1 VALUES ('a', 5, '1');
INSERT INTO vt1 VALUES ('b', 1, '3');
INSERT INTO vt1 VALUES ('b', 2, '1');
INSERT INTO vt1 VALUES ('b', 3, '1');
INSERT INTO vt1 VALUES ('b', 4, '2');

Here's the actual select:

WITH RECURSIVE cte (foo, counter, bar, rsum) AS
(
SELECT
    foo
  , counter
  , bar
  , bar AS rsum
FROM 
    vt1
QUALIFY ROW_NUMBER() OVER (PARTITION BY foo ORDER BY counter) = 1

UNION ALL

SELECT
    t.foo
  , t.counter
  , t.bar
  , CASE WHEN cte.rsum < 3 THEN t.bar + cte.rsum ELSE t.bar END
FROM
    vt1 t JOIN cte ON t.foo = cte.foo AND t.counter = cte.counter + 1
)

SELECT 
    cte.*
  , CASE WHEN rsum < 5 THEN 0 ELSE 1 END AS tester
FROM 
    cte
ORDER BY 
    foo
    , counter
;

Which will finally give us:

╔═════╦═════════╦═════╦══════╦════════╗
║ foo ║ counter ║ bar ║ rsum ║ tester ║
╠═════╬═════════╬═════╬══════╬════════╣
║ a   ║       1 ║   1 ║    1 ║      0 ║
║ a   ║       2 ║   2 ║    3 ║      0 ║
║ a   ║       3 ║   2 ║    5 ║      1 ║
║ a   ║       4 ║   4 ║    4 ║      0 ║
║ a   ║       5 ║   1 ║    5 ║      1 ║
║ b   ║       1 ║   3 ║    3 ║      0 ║
║ b   ║       2 ║   1 ║    4 ║      0 ║
║ b   ║       3 ║   1 ║    5 ║      1 ║
║ b   ║       4 ║   2 ║    2 ║      0 ║
╚═════╩═════════╩═════╩══════╩════════╝

The case statements handles the reset for us.

It's kind of ugly, but I've never been able to get this to work any other way.