Is it possible to loop over SAS datasets?

2019-02-05 01:47发布

问题:

I have 60 sas datasets that contain data on consumers individual characteristics such as id, gender, age, amountSpent, .... Each dataset shows data only for one time period (data1 is Jan, data2 is Feb...). I cannot merge them because of the size and some other issues.

How can I write a multiple loop to go through each of the datasets, do some manipulations and save the estimated values to a temporary file.

SAS does not have a for loop. How can I use do?

回答1:

But sas does have a do while macro loop. So basically you need 3 things: 1. In some way, a listing of your datasets. 2. A macro that loops over this listing. 3. The stuff you want to do.

E.g., let us presume you have a dataset WORK.DATASET_LIST that contains a variable library (libname) and a variable member (dataset name) for every dataset you want to loop across.

Then you could do:

%macro loopOverDatasets();
    /*imho good practice to declare macro variables of a macro locally*/
    %local datasetCount iter inLibref inMember;

    /*get number of datasets*/
    proc sql noprint;
        select count(*)
         into :datasetCount
        from WORK.DATASET_LIST;
    quit;

    /*initiate loop*/
    %let iter=1;
    %do %while (&iter.<= &datasetCount.);
        /*get libref and dataset name for dataset you will work on during this iteration*/
        data _NULL_;
            set WORK.DATASET_LIST (firstobs=&iter. obs=&iter.); *only read 1 record;
            *write the libname and dataset name to the macro variables;
            call symput("inLibref",strip(libname));
            call symput("inMember",strip(member));
            *NOTE: i am always mortified by the chance of trailing blanks torpedoing my code, hence the strip function;
        run;

        /*now you can apply your logic to the dataset*/
        data &inLibref..&inMember.; *assuming you want to apply the changes to the dataset itself;
            set &inLibref..&inMember.;
            /*** INSERT YOUR LOGIC HERE ***/
        run;

        /*** ANY OTHER PROCS/DATA STEPS ***/
        /*just remember to use &inLibref..&inMember. to refer to the current dataset*/

        /*increment the iterator of the loop*/
        %let iter=%eval(&iter.+1);
    %end;
%mend;

/*call the macro*/
%loopOverDatasets()

That is the idea. Maybe you want to gather the list of your datasets in a different way. e.g., a macro variable containing them all. In that case you'll have to use the %scan function in the loop to pick a dataset. Or maybe there is logic in the naming, e.g., dataset1, dataset2, dataset3..., in which case you could simply make use of the &iter. macro variable.



回答2:

Another option is to create a view that combines all the datasets, you won't get any data size problems with this approach (although I don't know if the other issues you refer to would be a problem here). You'll need a list of the relevant datasets, which may be obtained from DICTIONARY.TABLES in PROC SQL.

proc sql noprint;
select memname into :ds_list separated by ' '
from dictionary.tables
where libname='XXXXX';
quit;

data combined / view=combined;
set &ds_list;
run;

Then just run your summary against the view, so there's no need to loop through each dataset. I assume your datasets have a date variable, otherwise you'll need to add in some extra functionality (this applies to any solution). It would be interesting to see how this performs compared to the other solutions here.



回答3:

This is how I solve this problem in my day to day programming, when it is necessary to call a macro repeatedly based on data. This approach works equally well with many datasets or many variables from one dataset or many different macro calls from one dataset - whichever it is, simply create a dataset with the information that varies and call it this way.

This approach combines elements of Shorack's solution with user2337871's and Neil's. Why do it differently?

  • Macro should be called with parameters, not contain its parameter definitions inside of it. That makes it more flexible for future use (where for example the dataset_list dataset might be something different).
  • Having flexibility to call based on dataset of names rather than requiring macro to call macro
  • Removing code into a macro (instead of inside the call execute or other calling method) makes it easier to read.
  • call execute may have some drawbacks depending on the manipulation you are doing (related to macro variable timing)

Let's say you are doing a PROC MEANS and then appending that to a master dataset. While this is actually a very slow and annoying way to do that (as opposed to combining them together and using BY, or even using ODS OUTPUT with noncombined datasets), we'll assume your actual task is more complicated.

%macro do_my_stuff(dataset=);
proc means data=&dataset noprint;
var count;
output out=dsn_&dataset. mean=;
run;

proc append base=results data=dsn_&dataset. force;
run;
%mend do_my_Stuff;

proc sql;
select cats('%do_my_stuff(dataset=',name,')') into :stufflist separated by ' '
from dictionary.tables
where memname='WORK';
quit;

&stufflist;

You can add additional criteria to the where statement in the proc sql, or call that using CALL EXECUTE, or a number of different options. You can also use a self-made dataset with the dataset names (and even the variables as another column and macro parameter, if the variable(s) of interest vary by dataset).



回答4:

My go-to answer is macro.

%MACRO process_datasets(mdataset);
     data &mdataset.;
     set &mdataset.;
     if age >= '50' then discountRate = .2;
     *whatever else you need here;
     run;

     data _null_;
       file 'tmp.csv' mod dsd dlm=',';  *I'm assuming you're saving everything to the same file;
       set &mdataset.;
       put (_all_) (+0);
     run;

%MEND process_datasets;

then you can call it from another macro loop...

%MACRO loop_through_all;
    %DO i = 1 to 60;
       %process_datasets(data&i.);
     %END;
%MEND loop_through_all;

%loop_through_all;


回答5:

No need for macros, CALL EXECUTE will process every dataset in a libname or multiple libnames by simply building a query on the SASHELP.VTABLE and then executing a data step for each instance. I typically strip or compress the dataset names to ensure blanks don't cause problems, but will let you add that yourself. You can also make the relevant changes to subset and append the results to a single temp dataset.

DATA WANT;
    SET SASHELP.VTABLE (KEEP = LIBNAME MEMNAME  WHERE = (LIBNAME = "MAPSSAS")) END=EOF;
    STR = COMPBL("DATA " || MEMNAME || "; SET " || LIBNAME || "." || MEMNAME ||";" );
    STR1
    CALL EXECUTE (STR);

    IF EOF THEN DO;
        STR = 'RUN;';
        CALL EXECUTE (STR);
    END;
RUN;


标签: sas sas-macro