As suggested in the title, I'd like to drop all empty columns\variables(where all records are empty or equal null or ""), so as to reduce time cost in later execution.
Detailed scenario:
I have a dataset() with 1000 columns, some\lots of which are empty. Now I want to create a new dataset in which I need to add columns under some conditions of previous dataset.
data new;
set old;
if oldcol1 ne "" then newcol1='<a>'||strip(oldcol1)||'</a>';
end;
if oldcol2 ne "" then newcol2='<a>'||strip(oldcol2)||'</a>';
end;
...
...;
drop oldcol1 oldcol2.....oldcol1000;
run;
It takes quite some time to execute given the following reason:
number of old columns is huge
in fact I need to do a loop in another dataset to set the number after oldcol
ColNumber
1
2
3
...
1000
So you can imagine how many times to be executed in terms of searching, finding and setting values.
Hence one way I could think of to reduce time cost is drop all empty columns first. But any inputs regarding optimizing the algorithm is highly welcomed as well.
Thanks
I agree that proc transpose is a good idea:
Something like this?
Here's a generic macro that you can use to generate a list of the empty columns in the source data set, which you can then pass to a drop statement. It uses proc format and proc freq so it is relatively fast.
Here is how you might use it: