SAS: Using the lag function without a set statemen

2019-08-07 10:17发布

问题:

Could someone explain why the following two pieces of code give different results? I would like to simulate some simple time series processes in SAS, but I'm struggling with the lag function.

Specifically, in program 1, the variable b contains no data, which is unexpected. In program 2, the lag function works as expected.

/*Program 1*/
data lagtest;
a = 1;
b=lag(a);
output;

a = 2;
b= lag(a);
output;

a = 3;
b= lag(a);
output;
run;


/*Program 2*/
data lagtest2;
input a;
datalines;
1
2
3
;
run;

data lagtest2;
set lagtest2;
b= lag(a);
run;

I've been reading about the lag function, but can't find references to its use in a datastep that does not take an input dataset.

Thanks very much for any help.

回答1:

Keith's roughly correct in that the correct approach is what he shows, but the reasoning isn't accurate. LAG works on data; input and output is irrelevant (and not really a meaningful distinction). It is, in fact, quite possible to make this work with only programmatically provided data.

data lagtest;
do a=1 to 3;
  b=lag(a);
  output;
end;
run;

Similarly, it's possible to make the second example not work, with a somewhat absurd example:

data lagtest2;
 p=1;
 set lagtest2 point=p;
 b= lag(a);
 output;
 p=2;
 set lagtest2 point=p;
 b=lag(a);
 output;
 p=3;
 set lagtest2 point=p;
 b=lag(a);
 output;
 stop;  
run;

The reason the first example doesn't work isn't the source of data; it's the number of lag calls. One of the most common mistakes is to believe that lag retrieves a value from previous record; that isn't true. The way lag works is that each call to lag creates a queue. Each time that lag statement is encountered, whatever value is in the argument is pushed onto the queue, and if the queue is at least the defined length+1 long, the value at the front of the queue is popped off. (For lag or lag1, the queue must be 2 long; for lag2 it must be 3 long; etc. - ie, the number of the function plus the value just popped on).

In your first example, you call lag three times, so three separate queues are created, and none of them ever are called a second time. In your second example, you call lag once, so one queue is created, and it is called three times.



回答2:

The LAG function works on input data, not output data. In your first example there is no input data, just output, therefore the lag value is always blank. In your second example you don't need the 2 sections of code, you could just put :

data lagtest2;
input a;
b= lag(a);
datalines;
1
2
3
;
run;


标签: sas