SAS: standard deviation on unfixed rolling window

2019-02-20 04:19发布

I think I posted similar question before. But this time I am struggling with data ID.

My data looks like

 date        Stock    value    standard_deviation
01/01/2015    VOD      18       ...
01/01/2015    VOD      15       ...
01/01/2015    VOD       5       ...
03/01/2015    VOD      66       ...
03/01/2015    VOD       7       ...
04/01/2015    VOD      19       ...
04/01/2015    VOD       7       ...
05/01/2015    VOD      3        ...
06/01/2015    VOD      7        ...
 .....          ...     ...     ...
01/01/2015    RBS      58       ...
01/01/2015    RBS      445      ...
01/01/2015    RBS      44       ...
03/01/2015    RBS      57       ...

I need to work out the moving average/std deviation for each stock based on (-3,+3) trading days.

Since those are trading days (not calendar days), and there are different number of trades in each day, I created a sub-query and applied the following code.

data want;
set input;
by   date;
retain gdate;
if first.date then gdate+1;
run;

proc sort data=want; by stock gdate ; run;


proc sql;
create table want1 as
select   
h.stock,
h.date,
h.value,
( select std(s.value) from want s
where h.gdate between s.gdate-2 and s.gdate+2) as std
from
 want h 
 group  by  stock;
  quit;

I tried group by stock. However, the code ignored the stock group and only gave me the moving std of the whole period. I need the moving std for different stocks.

Anyone can give me some idea ? Thanks !

1条回答
淡お忘
2楼-- · 2019-02-20 05:12

Let's get you familiarized with PROC EXPAND! It's going to be your new best friend in time series.

PROC EXPAND allows you to do basically all of the common transformations (and even ones you didn't know existed) on your data.

First, to answer your question:

Step 1: Combine all the values into one trading day per stock

proc sql noprint;
    create table have2 as
        select date, stock, sum(value) as total_value
        from have
        group by stock, date
        order by stock, date;
quit;

Step 2: Use PROC EXPAND to compute a +/- 3 day centered moving standard deviation

proc expand data=have2 
            out=want;
    id date;
    by stock;

    convert total_value = standard_deviation / transform=(cmovstd 7);
run;

Step 3: Merge back to the original table

proc sort data=have;
    by stock date;
run;

data want2;
    merge have
          want;
    by stock date;
run;

Explanation

We are exploiting the use of by-group processing and an existing procedure to do the bulk of the work for us. SAS likes doesn't normally like to look forward due to how the language was designed, and PROC EXPAND is one of the very few procedures that is able to look forward in data without a lot of extra work. Another bonus of this procedure is that it doesn't break if there are gaps in the time series, so you can perform operations on any kind of sequential data.

One of the transformation operations, cmovstd, will apply a centered moving standard deviation on the data for us in order to achieve gathering future data for the moving standard deviation. Note that we chose a window of 7 to get a +/- 3 day centered moving standard deviation. That is because we need:

3 past days       | +3
current day       | +1
3 future days     | +3
                  | 7 = window size

Or, a total of 7 days in our window. If you wanted a +/- 2 day centered moving standard deviation, your window would be 5:

2 past days       | +2
current day       | +1
2 future days     | +2
                  | 5 = window size

If you choose an even number, you will have 1 or more lagged days to make the window choice valid. For example, a window of 4 will yield:

2 past days       | +2
current day       | +1
1 future day      | +1
                  | 4 = window size

PROC EXPAND is like the Swiss Army knife for time series. It will interpolate, extrapolate, transform, and convert between time periods all in one step. You may find it most useful in the following situations:

1. Applying a moving (average, std, etc.)

proc expand data=have 
                 out=want;
    <by variable(s)>;
    id <date variable>;
    convert <numeric variable(s)> = <new variable name> / transform=(<operation> <window>);
 run;

2. Filling in time gaps

proc expand data=have 
            out=want
            to=<day, month, year, etc.>;                 
    <by variable(s)>;
    id date;
    convert <numeric variable(s)> </method=<interpolation method> >;
run;
查看更多
登录 后发表回答