I am trying to transpose data in SAS from a long format to a wide format. The problem I'm having is that I have multiple columns that I'm trying to transpose. I have a few example datasets below to demonstrate what I'm trying to do. The actual dataset I'm doing this on is going to be very large, I think one way to handle this could be to tranpose individual columns and then merge at the end, but the dataset I'm going to be doing this on is going to be significantly larger (tens of thousands of columns), so this will be pretty unfeasible.
Below is the data I'm starting with:
data current_state;
input id $ att_1 $ att_2 $ att_3 $ att_4 $ att_5 $ Dollars;
datalines;
1 d234 d463 d213 d678 d435 50
2 d213 d690 d360 d145 d269 25
3 d409 d231 d463 d690 d609 10
;
Below is what I would want the outcome of the transpose to be:
data desired_state;
input id $ d145 $ d213 $ d231 $ d234 $ d269 $ d360 $ d409 $ d435 $ d463 $ d609 $ d678 $ d690;
datalines;
1 0 50 0 50 0 0 0 50 0 0 50 0
2 25 25 0 0 25 25 0 0 0 0 0 25
3 0 0 10 0 0 0 10 0 10 10 0 10
;
I have attempted the following, which isn't giving me the desired output.
proc transpose data=current_state out=test1;
by id;
id att_1 att_2 att_3 att_4 att_5;
var Dollars;
run;
Here's a fully dynamic way, but it's probably not the most efficient.
Convert 1 to dollar amount - left for you to do.
https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-dummy-variables-Categorical-Variables/ta-p/308484
Proc TRANSPOSE
does not have syntax for 'parallel pivoting' into a single row -- multiple ids are concatenated to become the resultant id for column name.A parallel pivot result can be achieved done using serial pivoting - first pivot to a an even taller vector layout, and then to the wide layout.
The column order in
want
is based on thecol1
value appearance in the data in the vector layout. You can force a sorted column order by creating a dummy record where thecol1
values are in the appearance order you want.The requirement for replacing missing values with zeroes should be considered carefully, especially if you want to count the number of ids with a d* value (Proc MEANS etc). Regardless, the replacement can be done in a post-transpose step.