I have a very large data set (18 million observations) that I would like to transpose by subsetting based on one variable and creating 900 new variables out of those sub/ets. Example code and desired output format below:
Example data:
data long1 ;
input famid year faminc ;
cards ;
var1 96 40000
var1 97 40500
var1 98 41000
var2 96 45000
var2 97 45400
var2 98 45800
var3 96 75000
var3 97 76000
var3 98 77000
;
run;
Code:
proc transpose data = data; by famid; var faminc; run;
Desired output format:
Because of the size of my data set, I'm wondering if I have use a do loop to find first and last observations of each var and then iteratively subset. Does anyone know a sql or proc to achieve the desired output format?
I'm not sure what your subsetting would be based on, but if it's only the id, it is rather straightforward.
Using the example from this ucla's page on proc sort, your example would work fine using slightly modified input data:
data long;
input id year faminc ;
datalines ;
1 96 40000
1 97 40500
1 98 41000
2 96 45000
2 97 45400
2 98 45800
3 96 75000
3 97 76000
3 98 77000
;
proc sort data=long;
by year;
run;
proc transpose data=long out=wide(drop=_name_) prefix=var;
by year;
var faminc;
run;
Results
year var1 var2 var3
96 40000 45000 75000
97 40500 45400 76000
98 41000 45800 77000
UCLA's SAS help pages are much clearer than SAS's own as regards to using proc transpose
in both directions. Here are 4 valuable links...
Long to Wide with Proc Transpose
Wide to Long with Proc Transpose
Long to Wide with Data Step
Wide to Long with Data Step
Transpose steps like this are easily done with the array
statement. Example for your code would be something like:
data new (keep = year var1-var3);
set data;
by year;
array vars {3} var1-var3;
retain var1-var3;
if first.year then i=1;
else i+1;
vars{i} = faminc;
if last.year then output;
run;
But I don't have SAS to double check this. Also see here.
There are no good transpose procedures in PROC SQL.Too limited.
Here are links to well tested in battle macros from the book "Data Preparation for Analytics"
From the sas community wiki you can download macro definitions.
Goto http://support.sas.com/kb/32/121.html for transposing from WIDE to LONG and http://support.sas.com/kb/32/122.html for transposing from LONG to WIDE.
More Details on "Data Preparation for Analytics" can be found under http://www.sascommunity.org/wiki/Data_Preparation_for_Analytics, where you can download all macros, programs and datasets for the book.
Hope this help