about “data merge” in SAS

2019-04-12 00:13发布

I am studying data merge in SAS, and find the following example

data newdata;
merge yourdata (in=a) otherdata (in=b);
by permno date;

I do not know what do "(in=a)" and "(in=b)" mean? Thanks.

回答1:

yourdata(in=a) creates a flag variable in the program data vector called 'a' that contains 1 if the record is from yourdata and 0 if it isn't. You can then use these variables to perform conditional operations based on the source of the record.

It might be easier to understand if you saw

data newdata;
merge yourdata(in=ThisRecordIsFromYourData) otherdata(in=ThisRecordIsFromOtherData);
by permno date;
run;

Suppose that records from yourdata needed to be manipulated in this step, but not those from otherdata, you could then do something like

data newdata;
merge yourdata(in=ThisRecordIsFromYourData) otherdata(in=ThisRecordIsFromOtherData);
by permno date;
if ThisRecordIsFromYourData then do;
  * some operation here for yourdata records only ;
end;
run;

An obvious use for these variables is to control what kind of 'merge' will occur, using if statements. For example, if ThisRecordIsFromYourData and ThisRecordIsFromOtherData; will make SAS only include rows that match on the by variables from both input data sets (like an inner join).