Cumulative sum over a set of rows in mysql

2019-01-05 05:25发布

I have a complex query(containing multiple joins, unions) that returns a set of rows containing id, day, hr, amount. The output of the query looks like this:

id day    hr  amount 
1   1      1   10       
1   1      2   25       
1   1      3   30        
1   2      1   10       
1   2      2   40       
1   2      2   30        
2   1      1   10       
2   1      2   15        
2   1      3   30       
2   2      1   10       
2   2      2   20      
2   2      2   30  

I need to find cumulative total for each id, for every hour of the day. The output should be like this:

id day    hr  amount cumulative total
1   1      1   10       10
1   1      2   25       35
1   1      3   30       65 
1   2      1   10       10
1   2      2   40       50
1   2      2   30       80 
2   1      1   10       10
2   1      2   15       25 
2   1      3   30       55
2   2      1   10       10
2   2      2   20       30
2   2      2   30       60

My initial query that produces the first output looks like this:

select id, day, hr, amount from
( //multiple joins on multiple tables)a
left join
(//unions on multiple tables)b
on a.id=b.id;

What's sql query to get the cumulative sum as described in the second output? SET should not be used in the solution.

Thanks.

3条回答
来,给爷笑一个
2楼-- · 2019-01-05 05:49

MySQL doesn't provide the type analytic function you would use to get a running "cumulative sum", like the analytic functions available in other DBMS (like Oracle or SQL Server.)

But, it is possible to emulate some analytic functions, using MySQL.

There are (at least) two workable approaches:

One is to use a correlated subquery to get the subtotal. This approach can be expensive on large sets, and complicated if the predicates on the outer query are complicated. It really depends on how complicated that "multiple joins on multiple tables" is. (Unfortunately, MySQL also does not not support CTEs either.)

The other approach is to make use of MySQL user variables, to do some control break processing. The "trick" here is to the results from your query sorted (using an ORDER BY) and then wrapping your query in another query.

I'll give an example of the latter approach.

Because of the order that MySQL performs operations, the cumulative_total column needs to be computed before the value from id and day from the current row are saved into user variables. It's just easiest to put this column first.

The inline view aliased as i (in the query below) is just there to initialize the user variables, just in case these are already set in the session. If those already have values assigned, we want to ignore their current values, and the easiest way to do that is to initialize them.

Your original query gets wrapped in parenthesis, and is given an alias, c in the example below. The only change to your original query is the addition of an ORDER BY clause, so we can be sure that we process the rows from the query in sequence.

The outer select checks whether the id and day value from the current row "match" the previous row. If they do, we add the amount from the current row to the cumulative subtotal. If they don't match, then we reset the the cumulative subtotal to zero, and add the amount from the current row (or, more simply, just assign the amount from the current row).

After we have done the computation of the cumulative total, we save the id and day values from the current row into user variables, so they are available when we process the next row.

For example:

SELECT IF(@prev_id = c.id AND @prev_day = c.day
         ,@cumtotal := @cumtotal + c.amount
         ,@cumtotal := c.amount) AS cumulative_total
     , @prev_id  := c.id  AS `id`
     , @prev_day := c.day AS `day`
     , c.hr
     , c.amount AS `amount'
  FROM ( SELECT @prev_id  := NULL
              , @prev_day := NULL
              , @subtotal := 0
       ) i
  JOIN (

         select id, day, hr, amount from
         ( //multiple joins on multiple tables)a
         left join
         (//unions on multiple tables)b
         on a.id=b.id

         ORDER BY 1,2,3
       ) c

If it's necessary to return the columns in a different order, with cumulative total as the last column, then one option is to wrap that whole statement in a set of parens, and use that query as an inline view:

SELECT d.id
     , d.day
     , d.hr
     , d.amount
     , d.cumulative_total
FROM (
       // query from above
     ) d
查看更多
3楼-- · 2019-01-05 06:03

here you go, here is your culminative total...

select f1.id, f1.day, f1.hr, f1.amount, sum(f2.amount) as culminative_total from foo f1
 inner join foo f2 on (f1.day = f2.day and f1.id=f2.id)
 where f2.hr <= f1.hr
 group by f1.id, f1.day, f1.hour;
查看更多
狗以群分
4楼-- · 2019-01-05 06:10

If you're on MySQL 8 or later, you should use window functions for this. Your query would read:

SELECT
  id, day, hr, amount,
  SUM (amount) OVER (PARTITION BY id, day ORDER BY hr) AS `cumulative total`
FROM t

Where t is your table b left joined to a. Some notes:

  • The PARTITION BY clause guarantees that you get a cumulative sum per id and day, so each day, we start summing afresh
  • The ORDER BY clause defines by what ordering the cumulation should happen
查看更多
登录 后发表回答