I have a given data set:
Policy_Number,var1,var2,var3,Exposure
1,B,H,J,191
2,B,F,Unknown,174
3,C,Unknown,I,153
4,B,G,L,192
5,Unknown,E,Unknown,184
6,D,E,K,113
7,C,Unknown,I,140
8,A,H,I,133
9,C,F,I,194
10,Unknown,G,Unknown,105
11,B,H,L,172
12,A,Unknown,I,198
13,D,E,K,155
14,Unknown,G,K,177
15,B,H,Unknown,100
16,D,Unknown,J,176
17,B,E,I,112
18,Unknown,E,J,192
19,C,Unknown,K,146
20,C,G,Unknown,187
i want to convert the given data into following form by using PROC Means or Summary:
Variables Levels Tot_Exposures
Var1 A 331
Var1 B 941
Var1 C ...
Var1 D ...
Var1 Unknown ...
Var2 E ...
Var2 F ...
Var2 G ...
Var2 H ...
Var2 Unknown ...
Var3 I ...
Var3 J ...
Var3 K ...
Var3 L ...
Var3 Unknown ...
The Tot_Exposure returns the total exposure of each VariableName I want this summarized table. Please help me out here.
EDIT: i have tried the proc means method but i want it to carry out in one step. i did it in 3 steps. and got an output like the image.
The code is like:
data try2;
infile 'complex.csv' dsd dlm = ',' FIRSTOBS = 2;
Length Policy_Number Var1 $ 10 Var2 $ 10 Var3 $ 10 Exposure 3;
input Policy_Number $ Var1 $ Var2 $ Var3 $ Exposure;
run;
proc sort data = try2;
by Exposure;
run;
proc means data = try2 SUM;
class Var1;
var exposure;
output out = want;
title ' Var1';
run;
proc means data = try2 SUM;
class Var2;
var exposure;
output out = want2;
title 'Var2';
run;
proc means data = try2 SUM;
class Var3;
var exposure;
output out = want3;
title 'Var3';
run;
This will easily handle 350 variables 45 million records will take a while but PROC SUMMARY can easily handle that too. The variable LEVELS needs to be defined with a length equal to or greater than the longest formatted value of all class variables. You can ask how that is done in another question.
Sorry for carrying my old answer to a new post but dependent on your need for specifically proc summary my old method would calculate this for you.
If you swap out the last SQL part (I just added it in) to:
You can get all three variables summarized in one PROC SUMMARY step but the output is not exactly what you specify. But that can achieved with a data step manipulation of the PROC SUMMARY output. I used the MLF option on the CLASS statement to "convert" all the CLASS variable to character. You don't have but it is useful as class variable can be either character of numeric.