SAS : Summarise the data

2019-08-13 16:33发布

I have a given data set:

Policy_Number,var1,var2,var3,Exposure
1,B,H,J,191
2,B,F,Unknown,174
3,C,Unknown,I,153
4,B,G,L,192
5,Unknown,E,Unknown,184
6,D,E,K,113
7,C,Unknown,I,140
8,A,H,I,133
9,C,F,I,194
10,Unknown,G,Unknown,105
11,B,H,L,172
12,A,Unknown,I,198
13,D,E,K,155
14,Unknown,G,K,177
15,B,H,Unknown,100
16,D,Unknown,J,176
17,B,E,I,112
18,Unknown,E,J,192
19,C,Unknown,K,146
20,C,G,Unknown,187

i want to convert the given data into following form by using PROC Means or Summary:

Variables   Levels   Tot_Exposures
  Var1        A           331
  Var1        B           941
  Var1        C           ...
  Var1        D           ...
  Var1     Unknown        ...   
  Var2        E           ...
  Var2        F           ...
  Var2        G           ...
  Var2        H           ...
  Var2     Unknown        ...   
  Var3        I           ...
  Var3        J           ...
  Var3        K           ...
  Var3        L           ...
  Var3     Unknown        ...

The Tot_Exposure returns the total exposure of each VariableName I want this summarized table. Please help me out here.

EDIT: i have tried the proc means method but i want it to carry out in one step. i did it in 3 steps. and got an output like the image.

Output

The code is like:

data try2;
infile 'complex.csv' dsd dlm = ',' FIRSTOBS = 2;
Length Policy_Number Var1 $ 10 Var2 $ 10 Var3 $ 10 Exposure 3;
input Policy_Number $ Var1 $ Var2 $ Var3 $ Exposure;
run;
proc sort data = try2;
by Exposure;
run;
proc means data = try2 SUM;
class Var1;
var exposure;
output out = want;
title ' Var1';
run;
proc means data = try2 SUM;
class Var2;
var exposure;
output out = want2;
title 'Var2';
run;
proc means data = try2 SUM;
class Var3;
var exposure;
output out = want3;
title 'Var3';
run;

标签: csv sas
3条回答
Fickle 薄情
2楼-- · 2019-08-13 17:11

This will easily handle 350 variables 45 million records will take a while but PROC SUMMARY can easily handle that too. The variable LEVELS needs to be defined with a length equal to or greater than the longest formatted value of all class variables. You can ask how that is done in another question.

data exp;
   infile cards dsd firstobs=2;
   input Policy_Number (var1-var3) ($) Exposure;
   arbitraryname243 = rank(first(var1));
   arbitraryname4 = rantbl(123,.4);
   arbitraryname36 = rank(first(var3));
   cards;
Policy_Number,var1,var2,var3,Exposure
1,B,H,J,191
2,B,F,Unknown,174
3,C,Unknown,I,153
4,B,G,L,192
5,Unknown,E,Unknown,184
6,D,E,K,113
7,C,Unknown,I,140
8,A,H,I,133
9,C,F,I,194
10,Unknown,G,Unknown,105
11,B,H,L,172
12,A,Unknown,I,198
13,D,E,K,155
14,Unknown,G,K,177
15,B,H,Unknown,100
16,D,Unknown,J,176
17,B,E,I,112
18,Unknown,E,J,192
19,C,Unknown,K,146
20,C,G,Unknown,187
;;;;
   run;
proc transpose data=exp(obs=0 drop=policy_number exposure) out=varlist;
   var _all_;
   run;
Proc sql noprint; 
   select nliteral(_name_) into :classvars separated by ' ' from varlist;
   quit;
%put NOTE: &=classvars;

proc summary data=exp descendtypes chartype;
   class &classvars / mlf;
   ways 1;
   freq Exposure;
   output out=test(rename=(_freq_=TotExposures));
   run;
data want(keep=Variable levels totexposures);
   length variable $32 levels $8;
   set test;
   array v[*] &classvars;
   i = indexc(_type_,'1');
   variable = vname(v[i]);
   levels = v[i];
   run;
查看更多
时光不老,我们不散
3楼-- · 2019-08-13 17:16

Sorry for carrying my old answer to a new post but dependent on your need for specifically proc summary my old method would calculate this for you.

If you swap out the last SQL part (I just added it in) to:

proc sql;
create table OUT as
  select VARIABLENAME
            , VARIABLEVALUE
            , sum(EXPOSURE)
    from
        GET_MAX
    group by 1,2
;quit;
查看更多
▲ chillily
4楼-- · 2019-08-13 17:25

You can get all three variables summarized in one PROC SUMMARY step but the output is not exactly what you specify. But that can achieved with a data step manipulation of the PROC SUMMARY output. I used the MLF option on the CLASS statement to "convert" all the CLASS variable to character. You don't have but it is useful as class variable can be either character of numeric.

data exp;
   infile cards dsd firstobs=2;
   input Policy_Number (var1-var3) ($) Exposure;
   cards;
Policy_Number,var1,var2,var3,Exposure
1,B,H,J,191
2,B,F,Unknown,174
3,C,Unknown,I,153
4,B,G,L,192
5,Unknown,E,Unknown,184
6,D,E,K,113
7,C,Unknown,I,140
8,A,H,I,133
9,C,F,I,194
10,Unknown,G,Unknown,105
11,B,H,L,172
12,A,Unknown,I,198
13,D,E,K,155
14,Unknown,G,K,177
15,B,H,Unknown,100
16,D,Unknown,J,176
17,B,E,I,112
18,Unknown,E,J,192
19,C,Unknown,K,146
20,C,G,Unknown,187
;;;;
   run;
proc summary data=exp descendtypes chartype;
   class var: / mlf;
   ways 1;
   freq Exposure;
   output out=test(rename=(_freq_=TotExposures));
   run;
data want;
   length variable $32 levels $8;
   set test;
   array v[*] var1-var3;
   drop var1-var3 i _type_;
   i = indexc(_type_,'1');
   variable = vname(v[i]);
   levels = v[i];
   run;

enter image description here

查看更多
登录 后发表回答