How to detect all empty columns in a dataset and d

2019-06-27 09:46发布

As suggested in the title, I'd like to drop all empty columns\variables(where all records are empty or equal null or ""), so as to reduce time cost in later execution.

Detailed scenario:

I have a dataset() with 1000 columns, some\lots of which are empty. Now I want to create a new dataset in which I need to add columns under some conditions of previous dataset.

data new;

   set old;

   if oldcol1 ne "" then newcol1='<a>'||strip(oldcol1)||'</a>';

   end;

   if oldcol2 ne "" then newcol2='<a>'||strip(oldcol2)||'</a>';

   end;

   ...

   ...;

   drop oldcol1 oldcol2.....oldcol1000;
   run;

It takes quite some time to execute given the following reason:

  1. number of old columns is huge

  2. in fact I need to do a loop in another dataset to set the number after oldcol

ColNumber

 1

 2

 3

...

1000

So you can imagine how many times to be executed in terms of searching, finding and setting values.

Hence one way I could think of to reduce time cost is drop all empty columns first. But any inputs regarding optimizing the algorithm is highly welcomed as well.

Thanks

标签: sql sas
3条回答
你好瞎i
2楼-- · 2019-06-27 10:20

I agree that proc transpose is a good idea:

proc transpose data=old out=temp; 
var _ALL_;
run;

data _NULL_;
set temp end=eof;
    array cols {*} COL: ;
do i = 1 to dim(cols);
    cols[i]=ifn((strip(cols[i])=" " or strip(cols[i])="."),0,1);
end;
if sum(of COL:)=0 then 
call symput("dropvars", catx(" ",symget("dropvars"),_NAME_));
run;

    data new; set old (drop=&dropvars); run;
查看更多
在下西门庆
3楼-- · 2019-06-27 10:35

Something like this?

data work.temp1;
  attrib idcol length=8;
  set work.old;

  idcol=_n_;
run;

proc transpose data=work.temp1 out=work.temp2 name=varname;
  var oldcol1-oldcol1000;
  by idcol;
run;

proc sql;
  create table work.temp3 as 
    select distinct varname from work.temp2 where not missing(col1);
quit;

data _null_;
  set work.temp3 end=lastrec;

  attrib nvarname length=$32;

  if _n_=1 then do;
    call execute('data work.new;');
    call execute('set work.old;');
  end;

  nvarname = 'newcol' || strip(input(substr(varname,4),4.));
  call execute('attrib ' || strip(nvarname) || ' length=$250;');

  call execute(strip(nvarname) || '= "<a>" || strip(' || strip(varname) || ') || "</a>";' );

  if lastrec then do;
    call execute('drop oldcol1-oldcol1000;');
    call execute('run;');
  end;
run;
查看更多
Root(大扎)
4楼-- · 2019-06-27 10:43

Here's a generic macro that you can use to generate a list of the empty columns in the source data set, which you can then pass to a drop statement. It uses proc format and proc freq so it is relatively fast.

%macro findmiss(ds,macvar);
%local noteopt;
%let noteopt=%sysfunc(getoption(notes));
option nonotes;
*ds is the data set to parse for missing values;
*macvar is the macro variable that will store the list of empty columns;
%global &macvar; 
proc format;
  value nmis  .-.z =' ' other='1';
  value $nmis ' '=' ' other='1';
run;
ods listing close;
ods output OneWayFreqs=OneValue(
  where=(frequency=cumfrequency 
  AND CumPercent=100));

proc freq data=&ds;
  table _All_ / Missing ;
  format _numeric_ nmis. 
        _character_ $nmis.;
  run;
ods listing;
data missing(keep=var);
  length var $32.;
  set OneValue end=eof;
    if percent eq 100 AND sum(of F_:) < 1 ;
    var = scan(Table,-1,' ');
run;
proc sql noprint;
  select var into: &macvar separated by " "
  from missing;quit;
option &noteopt.;
%mend;

Here is how you might use it:

%findmiss(old,droplist); /*generate the list of empty columns */
data new;
  set old(drop=&droplist);
run;
查看更多
登录 后发表回答