SQL moving average

2020-02-28 02:48发布

How do you create a moving average in SQL?

Current table:

Date             Clicks 
2012-05-01       2,230
2012-05-02       3,150
2012-05-03       5,520
2012-05-04       1,330
2012-05-05       2,260
2012-05-06       3,540
2012-05-07       2,330

Desired table or output:

Date             Clicks    3 day Moving Average
2012-05-01       2,230
2012-05-02       3,150
2012-05-03       5,520          4,360
2012-05-04       1,330          3,330
2012-05-05       2,260          3,120
2012-05-06       3,540          3,320
2012-05-07       2,330          3,010

13条回答
我命由我不由天
2楼-- · 2020-02-28 03:14

assume x is the value to be averaged and xDate is the date value:

SELECT avg(x) from myTable WHERE xDate BETWEEN dateadd(d, -2, xDate) and xDate

查看更多
仙女界的扛把子
3楼-- · 2020-02-28 03:17

For the purpose, I'd like to create an auxiliary/dimensional date table like

create table date_dim(date date, date_1 date, dates_2 date, dates_3 dates ...)

while date is the key, date_1 for this day, date_2 contains this day and the day before; date_3...

Then you can do the equal join in hive.

Using a view like:

select date, date               from date_dim
union all
select date, date_add(date, -1) from date_dim
union all
select date, date_add(date, -2) from date_dim
union all
select date, date_add(date, -3) from date_dim
查看更多
啃猪蹄的小仙女
4楼-- · 2020-02-28 03:19

NOTE: THIS IS NOT AN ANSWER but an enhanced code sample of Diego Scaravaggi's answer. I am posting it as answer as the comment section is insufficient. Note that I have parameter-ized the period for Moving aveage.

declare @p int = 3
declare @t table(d int, bal float)
insert into @t values
(1,94),
(2,99),
(3,76),
(4,74),
(5,48),
(6,55),
(7,90),
(8,77),
(9,16),
(10,19),
(11,66),
(12,47)

select a.d, avg(b.bal)
from
       @t a
       left join @t b on b.d between a.d-(@p-1) and a.d
group by a.d
查看更多
beautiful°
5楼-- · 2020-02-28 03:19

In hive, maybe you could try

select date, clicks, avg(clicks) over (order by date rows between 2 preceding and current row) as moving_avg from clicktable;
查看更多
可以哭但决不认输i
6楼-- · 2020-02-28 03:20

This is an Evergreen Joe Celko question. I ignore which DBMS platform is used. But in any case Joe was able to answer more than 10 years ago with standard SQL.

Joe Celko SQL Puzzles and Answers citation: "That last update attempt suggests that we could use the predicate to construct a query that would give us a moving average:"

SELECT S1.sample_time, AVG(S2.load) AS avg_prev_hour_load
FROM Samples AS S1, Samples AS S2
WHERE S2.sample_time
BETWEEN (S1.sample_time - INTERVAL 1 HOUR)
AND S1.sample_time
GROUP BY S1.sample_time;

Is the extra column or the query approach better? The query is technically better because the UPDATE approach will denormalize the database. However, if the historical data being recorded is not going to change and computing the moving average is expensive, you might consider using the column approach.

MS SQL Example:

CREATE TABLE #TestDW
( Date1 datetime,
  LoadValue Numeric(13,6)
);

INSERT INTO #TestDW VALUES('2012-06-09' , '3.540' );
INSERT INTO #TestDW VALUES('2012-06-08' , '2.260' );
INSERT INTO #TestDW VALUES('2012-06-07' , '1.330' );
INSERT INTO #TestDW VALUES('2012-06-06' , '5.520' );
INSERT INTO #TestDW VALUES('2012-06-05' , '3.150' );
INSERT INTO #TestDW VALUES('2012-06-04' , '2.230' );

SQL Puzzle query:

SELECT S1.date1,  AVG(S2.LoadValue) AS avg_prev_3_days
FROM #TestDW AS S1, #TestDW AS S2
WHERE S2.date1
    BETWEEN DATEADD(d, -2, S1.date1 )
    AND S1.date1
GROUP BY S1.date1
order by 1;
查看更多
混吃等死
7楼-- · 2020-02-28 03:23

One way to do this is to join on the same table a few times.

select
 (Current.Clicks 
  + isnull(P1.Clicks, 0)
  + isnull(P2.Clicks, 0)
  + isnull(P3.Clicks, 0)) / 4 as MovingAvg3
from
 MyTable as Current
 left join MyTable as P1 on P1.Date = DateAdd(day, -1, Current.Date)
 left join MyTable as P2 on P2.Date = DateAdd(day, -2, Current.Date)
 left join MyTable as P3 on P3.Date = DateAdd(day, -3, Current.Date)

Adjust the DateAdd component of the ON-Clauses to match whether you want your moving average to be strictly from the past-through-now or days-ago through days-ahead.

  • This works nicely for situations where you need a moving average over only a few data points.
  • This is not an optimal solution for moving averages with more than a few data points.
查看更多
登录 后发表回答