Using Over(Partition By) when calculating median m

2019-07-15 05:24发布

问题:

Good morning, I am trying to calculate a 12 month moving average cost (MAUC) for each item in a particular warehouse. I am using the 2012_B – paging trick to calculate the median price (http://sqlperformance.com/2012/08/t-sql-queries/median) instead of using AVG in order to remove the potential for outliers to skew the result.

The following code works, however it only calculates the MAUC for one item or all items - depending on whether I remove or retain "AND t_item = 'xxxxx'


WITH Emily AS 

(SELECT 

 t_item AS [Item Code]
,t_mauc_1 AS [MAUC]

FROM twhina113100
WHERE t_cwar = '11'

AND t_item = '         TNC-C2050NP-G'

AND t_trdt > GETDATE()-365)


(SELECT 

AVG(1.0 * [Valuation Table].[MAUC])

FROM (
    SELECT [MAUC] FROM Emily
     ORDER BY [Emily].[MAUC]

     OFFSET ((SELECT COUNT(*) FROM Emily) - 1) / 2 ROWS
     FETCH NEXT 1 + (1 - (SELECT COUNT(*) FROM Emily) % 2) ROWS ONLY

)  AS [Valuation Table] )

I believe that using Over(Partition By) may help me to partition by t_item however I am at a loss as to where to insert it into the code. I am quite new to SQL and my lack of formal training is starting to show.

If you have any other suggestions please share.

Any help would be much appreciated!

回答1:

This one caught my attention, so I'm posting two options:

The first is a straight cte approach, and the second uses temp tables. The cte approach is fine for smaller data sets, but performance suffers as the series expands.

Both options will calculate the RUNNING Min, Max, Mean, Median, and Mode for a data series

Just a couple of items before we get into it. The normalized structure is ID and Measure.
- The ID could be a date or identity.
- The Measure is any numeric value - Median is the mid-value of the sorted series. If an even number of observations we return the average of the two middle records - Mode is represented as ModeR1 and ModeR2. If no repeated values, we show the min/max range

OK, let's take a look at the cte Approach

Declare @Table table (ID Int,Measure decimal(9,2))
Insert into @Table (ID,Measure) values
(1,25),
(2,75),
(3,50),
(4,25),
(5,12),
(6,66),
(7,45)

;with cteBase as (Select *,RowNr = Row_Number() over (Order By ID) From  @Table),
      cteExpd as (Select A.*,Measure2 = B.Measure,ExtRowNr = Row_Number() over (Partition By A.ID Order By B.Measure) From cteBase A Join cteBase B on (B.RowNr<=A.RowNr)),
      cteMean as (Select ID,Mean=Avg(Measure2),Rows=Count(*) From cteExpd Group By ID),
      cteMedn as (Select ID,MedRow1=ceiling(Rows/2.0),MedRow2=ceiling((Rows+1)/2.0) From cteMean),
      cteMode as (Select ID,Mode=Measure2,ModeHits=count(*),ModeRowNr=Row_Number() over (Partition By ID Order By Count(*) Desc) From cteExpd Group By ID,Measure2)
 Select A.ID
       ,A.Measure
       ,MinVal  = min(Measure2)
       ,MaxVal  = max(Measure2)
       ,Mean    = max(B.Mean)
       ,Median  = isnull(Avg(IIF(ExtRowNr between MedRow1 and MedRow2,Measure2,null)),A.Measure)
       ,ModeR1  = isnull(max(IIf(ModeHits>1,D.Mode,null)),min(Measure2))
       ,ModeR2  = isnull(max(IIf(ModeHits>1,D.Mode,null)),max(Measure2))
  From  cteExpd A
  Join  cteMean B on (A.ID=B.ID)
  Join  cteMedn C on (A.ID=C.ID)
  Join  cteMode D on (A.ID=D.ID and ModeRowNr=1)
  Group By A.ID
          ,A.Measure
  Order By A.ID

Returns

ID  Measure MinVal  MaxVal  Mean        Median      ModeR1  ModeR2
1   25.00   25.00   25.00   25.000000   25.000000   25.00   25.00
2   75.00   25.00   75.00   50.000000   50.000000   25.00   75.00
3   50.00   25.00   75.00   50.000000   50.000000   25.00   75.00
4   25.00   25.00   75.00   43.750000   37.500000   25.00   25.00
5   12.00   12.00   75.00   37.400000   25.000000   25.00   25.00
6   66.00   12.00   75.00   42.166666   37.500000   25.00   25.00
7   45.00   12.00   75.00   42.571428   45.000000   25.00   25.00

This cte approach is very light and fast for smaller data series

Now the Temp Table Approach

-- Generate Base Data -- Key ID and Key Measure
Select ID     =TR_Date
      ,Measure=TR_Y10,RowNr = Row_Number() over (Order By TR_Date)
 Into  #Base
 From [Chinrus-Series].[dbo].[DS_Treasury_Rates] 
 Where Year(TR_Date)>=2013

-- Extend Base Data one-to-many
Select A.*,Measure2 = B.Measure,ExtRowNr = Row_Number() over (Partition By A.ID Order By B.Measure) into #Expd From #Base A Join #Base B on (B.RowNr<=A.RowNr) 
Create Index idx on #Expd (ID)

-- Generate Mean for Series
Select ID,Mean=Avg(Measure2),Rows=Count(*) into #Mean From #Expd Group By ID
Create Index idx on #Mean (ID)

-- Calculate Median Row Number(s)  -- If even(avg of middle two rows)
Select ID,MednRow1=ceiling(Rows/2.0),MednRow2=ceiling((Rows+1)/2.0) into #Medn From #Mean
Create Index idx on #Medn (ID)

-- Calculate Mode
Select * into #Mode from (Select ID,Mode=Measure2,ModeHits=count(*),ModeRowNr=Row_Number() over (Partition By ID Order By Count(*) Desc,Measure2 Desc) From #Expd Group By ID,Measure2) A where ModeRowNr=1
Create Index idx on #Mode (ID)

-- Generate Final Results
 Select A.ID
       ,A.Measure
       ,MinVal  = min(Measure2)
       ,MaxVal  = max(Measure2)
       ,Mean    = max(B.Mean)
       ,Median  = isnull(Avg(IIF(ExtRowNr between MednRow1 and MednRow2,Measure2,null)),A.Measure)
       ,ModeR1  = isnull(max(IIf(ModeHits>1,D.Mode,null)),min(Measure2))
       ,ModeR2  = isnull(max(IIf(ModeHits>1,D.Mode,null)),max(Measure2))
  From  #Expd A
  Join  #Mean B on (A.ID=B.ID)
  Join  #Medn C on (A.ID=C.ID)
  Join  #Mode D on (A.ID=D.ID and ModeRowNr=1)
  Group By A.ID
          ,A.Measure
  Order By A.ID

Returns

ID          Measure MinVal  MaxVal  Mean    Median  ModeR1  ModeR2
2013-01-02  1.86    1.86    1.86    1.86    1.86    1.86    1.86
2013-01-03  1.92    1.86    1.92    1.89    1.89    1.86    1.92
2013-01-04  1.93    1.86    1.93    1.9033  1.92    1.86    1.93
2013-01-07  1.92    1.86    1.93    1.9075  1.92    1.92    1.92
2013-01-08  1.89    1.86    1.93    1.904   1.92    1.92    1.92
...
2016-07-20  1.59    1.37    3.04    2.2578  2.24    2.20    2.20
2016-07-21  1.57    1.37    3.04    2.257   2.235   2.61    2.61
2016-07-22  1.57    1.37    3.04    2.2562  2.23    2.20    2.20

Both approaches where validated in Excel

I should add that in the final query, you could certainly add/remove items like STD, Total