Automating readins in SAS to avoid truncation and

2019-09-15 05:18发布

问题:

I've run into issues with proc import and large files, so I've been trying to develop a way to automate the readin process myself. Basically, I start with a file, read in all variables as character variables with a gratuitous length, run through the data set to determine the max length the variable actually takes on, and then alters the readin to cut down the lengths. Then, it tries to determine which variables should be numeric/datetime, and then converts them. For simplicity, I'm just posting the datetime part.

I have the following dataset:

data test;
    do i=1 to 10;
        j="01JAN2015:3:48:00";
        k="23SEP1999:3:23:00";
        l="22FEB1992:2:22:12";
        m="Hello";
        output;
    end;
    drop i;
run;

I want to run through it and determine that I should convert each variable. What I do is count the number of times the input function is successful, then decide on a threshold (in this case, 90%) that it is successful. I'm assuming none of the observations are missing, but in the general case I consider that too. My code looks something like this:

proc contents data=test noprint out=test_c; run;

data test_numobs;
    set test_c nobs=temp;
    call symput('nobs',strip(temp));
run;


data test2;
    set test nobs=lastobs;
    array vars (*) $ _ALL_;
    length statement $1000;
    array tempnum(&nobs.) tempnum1-tempnum&nobs.;
    do i=1 to dim(vars);
        if input(vars(i),anydtdtm.) ne . then tempnum(i)+1;
    end;
    statement="";
    if _N_=lastobs then do i=1 to dim(vars);
        if tempnum(i)/lastobs >=.9 then 
statement=strip(statement)||" "||strip(vname(vars(i)))||'1=input('||strip(vname(vars(i)))||",anydtdtm.); format "||
strip(vname(vars(i)))||"1 datetime22.; drop "||strip(vname(vars(i)))||"; rename "||strip(vname(vars(i)))||"1="||strip(vname(vars(i)))||"; ";
    ds="test2";
    end;
    if _N_=lastobs then output;
run;

I only output the last row, which contains the line I want,

j1=input(j,anydtdtm.); format j1 datetime22.; drop j; rename j1=j; k1=input(k,anydtdtm.); format k1 datetime22.; drop k; rename k1=k; l1=input(l,anydtdtm.); format l1 datetime22.; drop l; rename l1=l;

And then send that into a macro to reformat the dataset.

This is a pretty roundabout program. I didn't include a lot of steps but I use the same idea in how to determine the proper variable lengths via generating length and input statements. My question is, does anyone have any better solutions for this type of problem?

Thanks!