Reading next k observation from current observatio

2019-05-28 21:37发布

问题:

Here's a very similar question

My question is a bit different from the one in the above link.

Background

I have a data set contains hourly data. So each object has 24 records per day. Now I want to create K new columns represents next 1,2,...K hourly records for each object. If not exist, replace them with missing values.

K is dynamic and is defined by users.

The original order must be preserved. No matter it's guaranteed in the data steps or by using sorting in the end.

I'm looking for an efficient way to achieve this.

Example

Original data:

Object Hour Value 
A      1    2.3
A      2    2.3
A      3    4.0
A      4    1.3

Given K = 2, desired output is

Object Hour Value Value1 Value2
A      1    2.3   2.3    4.0
A      2    2.3   4.0    1.3
A      3    4.0   1.3    .
A      4    1.3   .      .

Possible solutions

  1. sort in reverse order -> obtain previous k records -> sort them back.

When the no. of observation is large, this shouldn't be an ideal way.

  1. proc expand. I don't familiar with it cause it's never licensed on my pc.

  2. Using point in data step.

  3. retain statement inside data step. I'm not sure how this works.

回答1:

Assuming this is provided as a macro variable, this is pretty easily done with a side to side merge-ahead. Certainly faster than a transpose for K much larger than the total record count, and probably faster than looping POINTs.

Basically you merge the original dataset to itself, and use FIRSTOBS to push the starting point down one for each successive merge iteration. This needs a bit of extra work if you have BY groups that need protecting, but that's usually not too hard to manage.

Here's an example using SASHELP.CLASS:

%let K=5;

%macro makemergesets(k=, datain=, varin=, keepin=);
  %do _i = 2 %to &k;
    &datain (firstobs=&_i rename=&varin.=&varin._&_i. keep=&keepin. &varin.)
  %end;
%mend makemregesets;

data class_all;
  merge sashelp.class
    %makemergesets(k=&k,datain=sashelp.class, varin=age,keepin=)
  ;
run;


回答2:

You could transpose the hours and then freely access the hours ahead within each object. Just to set the value of K and generate some dummy data:

* Assign K ;
%let K=3 ;
%let Kn=value&k;

* Generate test objects each containing 24 hourly records ;
data time ;
  do object=1 to 10 ;
      do hour=1 to 24 ;
        value=round(ranuni(1)*10,0.1) ;
        output ;
      end ;
  end ;
run ;

EDIT: I updated the below step as realised the transpose isn't needed. Doing it all in one step gives ~20% improvement in CPU time

Use an array of the 24 hour values and loop through do i=1 to &k for each hour:

* Populate K variables ;
data output(keep=object hour value value1-&kn ) ;
  set time ;
  by object ;
  retain k1-k24 . ;
  array k(2,24) k1-k24 value1-value24 ;

  k(1,hour)=value ;

  if last.object then do hour=1 to 24 ;
    value=k(1,hour) ;
    do i=1 to &k ;
      if hour+i <=24 then k(2,i)=k(1,hour+i) ;
      else k(2,i)=.;
    end ;
    output ;
  end ;
run ;


标签: sas