SQL Select Statement For Calculating A Running Ave

2019-01-19 16:26发布

问题:

I am trying to have a running average column in the SELECT statement based on a column from the n previous rows in the same SELECT statement. The average I need is based on the n previous rows in the resultset.

Let me explain

Id        Number       Average
 1             1          NULL
 2             3          NULL
 3             2          NULL
 4             4             2 <----- Average of (1, 3, 2),Numbers from previous 3 rows
 5             6             3 <----- Average of (3, 2, 4),Numbers from previous 3 rows
 .             .             .
 .             .             .

The first 3 rows of the Average column are null because there are no previous rows. The row 4 in the Average column shows the average of the Number column from the previous 3 rows.

I need some help trying to construct a SQL Select statement that will do this.

回答1:

This should do it:

--Test Data
CREATE TABLE    RowsToAverage
    (
    ID int NOT NULL,
    Number int NOT NULL
    )

INSERT  RowsToAverage(ID, Number)
SELECT  1, 1
UNION ALL
SELECT  2, 3
UNION ALL
SELECT  3, 2
UNION ALL
SELECT  4, 4
UNION ALL
SELECT  5, 6
UNION ALL
SELECT  6, 8
UNION ALL
SELECT  7, 10

--The query
;WITH   NumberedRows
AS
(
SELECT  rta.*, row_number() OVER (ORDER BY rta.ID ASC) AS RowNumber
FROM    RowsToAverage rta
)

SELECT  nr.ID, nr.Number,
        CASE
            WHEN nr.RowNumber <=3 THEN NULL
            ELSE (  SELECT  avg(Number) 
                    FROM    NumberedRows 
                    WHERE   RowNumber < nr.RowNumber
                    AND     RowNumber >= nr.RowNumber - 3
                )
        END AS MovingAverage
FROM    NumberedRows nr


回答2:

Assuming that the Id column is sequential, here's a simplified query for a table named "MyTable":

SELECT 
    b.Id,
    b.Number,
    (
      SELECT 
       AVG(a.Number) 
      FROM 
       MyTable a 
     WHERE 
       a.id >= (b.Id - 3) 
       AND a.id < b.Id
       AND b.Id > 3 
     ) as Average
FROM 
    MyTable b;


回答3:

A simple self join would seem to perform much better than a row referencing subquery

Generate 10k rows of test data:

drop table test10k
create table test10k (Id int, Number int, constraint test10k_cpk primary key clustered (id))

;WITH digits AS (
    SELECT 0 as Number
    UNION SELECT 1
    UNION SELECT 2
    UNION SELECT 3
    UNION SELECT 4
    UNION SELECT 5
    UNION SELECT 6
    UNION SELECT 7
    UNION SELECT 8
    UNION SELECT 9
)
,numbers as (
    SELECT 
        (thousands.Number * 1000) 
        + (hundreds.Number * 100) 
        + (tens.Number * 10) 
        + ones.Number AS Number
    FROM digits AS ones 
    CROSS JOIN digits AS tens
    CROSS JOIN digits AS hundreds
    CROSS JOIN digits AS thousands
)
insert test10k (Id, Number)
select Number, Number
from numbers 

I would pull the special case of the first 3 rows out of the main query, you can UNION ALL those back in if you really want it in the row set. Self join query:

;WITH   NumberedRows
AS
(
    SELECT  rta.*, row_number() OVER (ORDER BY rta.ID ASC) AS RowNumber
    FROM    test10k rta
)

SELECT  nr.ID, nr.Number,
    avg(trailing.Number) as MovingAverage
FROM    NumberedRows nr
    join NumberedRows as trailing on trailing.RowNumber between nr.RowNumber-3 and nr.RowNumber-1
where nr.Number > 3
group by nr.id, nr.Number

On my machine this takes about 10 seconds, the subquery approach that Aaron Alton demonstrated takes about 45 seconds (after I changed it to reflect my test source table) :

;WITH   NumberedRows
AS
(
    SELECT  rta.*, row_number() OVER (ORDER BY rta.ID ASC) AS RowNumber
    FROM    test10k rta
)
SELECT  nr.ID, nr.Number,
    CASE
            WHEN nr.RowNumber <=3 THEN NULL
            ELSE (  SELECT  avg(Number) 
                            FROM    NumberedRows 
                            WHERE   RowNumber < nr.RowNumber
                            AND             RowNumber >= nr.RowNumber - 3
                    )
    END AS MovingAverage
FROM    NumberedRows nr

If you do a SET STATISTICS PROFILE ON, you can see the self join has 10k executes on the table spool. The subquery has 10k executes on the filter, aggregate, and other steps.



回答4:

Edit: I missed the point that it should average the three previous records...

For a general running average, I think something like this would work:

SELECT
    id, number, 
    SUM(number) OVER (ORDER BY ID) / 
       ROW_NUMBER() OVER (ORDER BY ID) AS [RunningAverage]
FROM myTable
ORDER BY ID


回答5:

Check out some solutions here. I'm sure that you could adapt one of them easily enough.



回答6:

If you want this to be truly performant, and arn't afraid to dig into a seldom-used area of SQL Server, you should look into writing a custom aggregate function. SQL Server 2005 and 2008 brought CLR integration to the table, including the ability to write user aggregate functions. A custom running total aggregate would be the most efficient way to calculate a running average like this, by far.



回答7:

Alternatively you can denormalize and store precalculated running values. Described here:

http://sqlblog.com/blogs/alexander_kuznetsov/archive/2009/01/23/denormalizing-to-enforce-business-rules-running-totals.aspx

Performance of selects is as fast as it goes. Of course, modifications are slower.