Count rows number by group and subgroup when some

2019-08-01 07:19发布

问题:

I know how to count group and subgroup numbers through proc freq or sql. My question is when some factor in the subgroup is missing, and I still want to show missing factor as 0. How can I do that? For example, the data set is:

group1 group2
1      A
1      A
1      A
1      A
2      A
2      B
2      B

I want a result as:

group1 group2 N
1      A      4
1      B      0
2      A      1
2      B      2

If I only use the default SAS setting, it will usually show as

group1 group2 N
1      A      4
2      A      1
2      B      2

But I still want to the second line in the result tell to me that there are 0 observations in that category.

回答1:

Reeza's sparse option works as long as each group is represented in your data at least once. Suppose there were a group1 3 that is not represented in your data, and you would still want them to show up in the frequency table. If that is the case, the solution is to create a reference table with all of your categories then right join your frequency table to it.

Create a reference table:

data ref;
  do group1 = 1 to 3;
    group2 = 'A';
    output;
    group2 = 'B';
    output;
  end;
run;

Create the frequency table with proc sql, right joining to the reference table:

proc sql;
select
  r.group1,
  r.group2,
  count(h.group1) as freq
from
  have h
  right join ref r
  on h.group1 = r.group1
  and h.group2 = r.group2
group by
  r.group1,
  r.group2
order by
  r.group1,
  r.group2
;
quit;


回答2:

Use the SPARSE option within proc freq. Consider it a cross join between all options from GROUP1 and GROUP2.

data have;
input group1 group2 $;
cards;
1 A
1 A
1 A
1 A
2 A
2 B
2 B
;
run;

proc freq data=have;
table group1*group2/out=want sparse;
run;

proc print data=want;
run;


回答3:

Another option that's a cross between DWal's issue of "what if the data isn't in the data" and Reeza's One Proc, One Solution, is proc tabulate. If the format contains all possible values, even if the values don't appear, it works, with printmiss.

proc format;
value groupformat
    1='Group 1'
    2='Group 2'
    3='Group 3'
    ;
quit;

data have;
input group1 group2 $;
cards;
1 A
1 A
1 A
1 A
2 A
2 B
2 B
;
run;

proc tabulate data=have;
  class group1 group2/preloadfmt;
  format group1 groupformat.;
  tables group1*group2,n/printmiss misstext='0';
run;


回答4:

How to do this via proc summary, using DWal's reference table to specify which combinations of values to use:

data ref;
  do group1 = 1 to 3;
    group2 = 'A';
    output;
    group2 = 'B';
    output;
  end;
run;

data have;
input group1 group2 $1.;
cards;
1 A
1 A
1 A
1 A
2 A
2 B
2 B
;
run;

proc summary nway data = have classdata=ref;
    class group1 group2;
    output out = summary (drop = _TYPE_);
run;

N.B. I had to tweak the have dataset slightly to make sure that group2 has length 1 in both datasets. If you use variables with the same name but different lengths in your classdata= and data= datasets, SAS will complain.



标签: sas