Running total over repeating group by items based

2019-03-04 00:38发布

My first post, so bear with me. I want to sum based upon a value that is broken by dates but only want the sum for the dates, not for the the group by item in total. Have been working on this for days, trying to avoid using a cursor but may have to.

Here's an example of the data I'm looking at. BTW, this is in Oracle 11g.

 Key     Time               Amt
------ ------------------ ------
 Null    1-1-2016  00:00    50
 Null    1-1-2016  02:00    50
 Key1    1-1-2016  04:00    30
 Null    1-1-2016  06:00    30
 Null    1-1-2016  08:00    30
 Key2    1-1-2016  10:00    40
 Null    1-1-2016  12:00    40
 Key1    1-1-2016  14:00    30
 Null    1-2-2016  00:00    30
 Key2    1-2-2016  02:00    35

The final result should look like this:

 Key    Start            Stop             Amt
------ ---------------- ---------------- -----
 Null   1-1-2016 00:00   1-1-2016 02:00   100
 Key1   1-1-2016 04:00   1-1-2016 08:00    90
 Key2   1-1-2016 10:00   1-1-2016 12:00    80
 Key1   1-1-2016 14:00   1-2-2016 00:00    60
 key2   1-2-2016 02:00   1-2-2016 02:00    35

I've been able to get the Key to fill in the Nulls. The key isn't always entered in but is assumed to be the value until actually changed.

SELECT key ,time ,amt
FROM (
    SELECT DISTINCT amt, time, 
        ,last_value(amt ignore nulls) OVER (
            ORDER BY time
            ) key
    FROM sample
    ORDER BY time, amt
    )
WHERE amt > 0
ORDER BY time, key NULLS first;

But when I try to get just a running total, it sums on the key even with the breaks. I cannot figure out how to get it break on the key. Here's my best shot at it which isn't very good and doesn't work correctly.

SELECT key,time, amt 
     , sum(amt) OVER (PARTITION BY key ORDER BY time) AS running_total
  FROM (SELECT key, time, amt
          FROM (SELECT DISTINCT
                         amt,
                         time, 
                         last_value(amt ignore nulls) OVER (ORDER BY time) key
                  FROM sample
                 ORDER BY time, amt
               )
         WHERE amt > 0
         ORDER BY time, key NULLS first
       )
ORDER BY time, key NULLS first;

Any help would be appreciated. Maybe using cursor is the only way.

Match sample data.

3条回答
Bombasti
2楼-- · 2019-03-04 00:45

I'm not sure what your example data has to do with the queries (your sample data is one table, for instance and the sample queries have many queries). But, for assigning the keys, you can use LAG() with the IGNORE NULLS option:

select s.*,
       lag(key ignore nulls) over (order by start) as new_key
from sample s;

Then, you want to group groups of the same key together. One method is a difference of row numbers. The final step is aggregation:

select new_key, min(time), max(time), sum(amount)
from (select s.*,
             (row_number() over (order by start) -
              row_number() over (partition by new_key order by start)
             ) as grp
      from (select s.*,
                   lag(key ignore nulls) over (order by start) as new_key
            from sample s
           ) s
     ) s
group by new_key, grp;
查看更多
别忘想泡老子
3楼-- · 2019-03-04 00:53

Assigning a group number whenever Key is not NULL can easily be calculated together with the LAG:

LAG(key ignore NULLS) OVER (ORDER BY time) AS new_key,
COUNT(key) OVER (ORDER BY time ROWS UNBOUNDED PRECEDING) AS grp
查看更多
冷血范
4楼-- · 2019-03-04 01:08

In order to get the sums you are looking for you need a way to group the values you are interested in. You can generate a grouping ID by using the a couple of ROW_NUMBER analytic functions, one partitioned by the key value. However due to your need to duplicate the KEY column values this will need to be done in a couple of stages:

WITH t1 AS (
  SELECT dta.*
       , last_value(KEY IGNORE NULLS)          -- Fill in the missing
               OVER (ORDER BY TIME ASC) key2   -- key values
    FROM your_data dta
), t2 AS (
  SELECT t1.*
       , row_number() OVER (ORDER BY TIME)     -- Generate a
       - row_number() OVER (PARTITION BY key2  -- grouping ID
                                ORDER BY TIME) gp
    FROM t1
)
SELECT t2.*
     , sum(amt) OVER (PARTITION BY gp, key2
                          ORDER BY TIME) running_sums
  FROM t2;

The above query creates a running sum of AMT that restarts every time the key value changes. Whereas the following query used in place of the last select statement above gives the requested results, which I wouldn't term a running sum.

SELECT key2
     , MIN(TIME) start_time
     , MAX(TIME) stop_time
     , sum(amt) amt
  FROM t2
 GROUP BY key2, gp;

To see the full time values you may want to either alter your session NLS_DATE_FORMAT as below:

ALTER SESSION SET NLS_DATE_FORMAT='DD-MM-RRRR HH24:MI:SS';

Or wrap each date column in a TO_CHAR function for output purposes.

查看更多
登录 后发表回答