I know how to count group and subgroup numbers through proc freq
or sql. My question is when some factor in the subgroup is missing, and I still want to show missing factor as 0. How can I do that? For example,
the data set is:
group1 group2
1 A
1 A
1 A
1 A
2 A
2 B
2 B
I want a result as:
group1 group2 N
1 A 4
1 B 0
2 A 1
2 B 2
If I only use the default SAS setting, it will usually show as
group1 group2 N
1 A 4
2 A 1
2 B 2
But I still want to the second line in the result tell to me that there are 0 observations in that category.
Reeza's sparse
option works as long as each group is represented in your data at least once. Suppose there were a group1 3 that is not represented in your data, and you would still want them to show up in the frequency table. If that is the case, the solution is to create a reference table with all of your categories then right join your frequency table to it.
Create a reference table:
data ref;
do group1 = 1 to 3;
group2 = 'A';
output;
group2 = 'B';
output;
end;
run;
Create the frequency table with proc sql, right joining to the reference table:
proc sql;
select
r.group1,
r.group2,
count(h.group1) as freq
from
have h
right join ref r
on h.group1 = r.group1
and h.group2 = r.group2
group by
r.group1,
r.group2
order by
r.group1,
r.group2
;
quit;
Use the SPARSE option within proc freq. Consider it a cross join between all options from GROUP1 and GROUP2.
data have;
input group1 group2 $;
cards;
1 A
1 A
1 A
1 A
2 A
2 B
2 B
;
run;
proc freq data=have;
table group1*group2/out=want sparse;
run;
proc print data=want;
run;
Another option that's a cross between DWal's issue of "what if the data isn't in the data" and Reeza's One Proc, One Solution, is proc tabulate
. If the format contains all possible values, even if the values don't appear, it works, with printmiss
.
proc format;
value groupformat
1='Group 1'
2='Group 2'
3='Group 3'
;
quit;
data have;
input group1 group2 $;
cards;
1 A
1 A
1 A
1 A
2 A
2 B
2 B
;
run;
proc tabulate data=have;
class group1 group2/preloadfmt;
format group1 groupformat.;
tables group1*group2,n/printmiss misstext='0';
run;
How to do this via proc summary, using DWal's reference table to specify which combinations of values to use:
data ref;
do group1 = 1 to 3;
group2 = 'A';
output;
group2 = 'B';
output;
end;
run;
data have;
input group1 group2 $1.;
cards;
1 A
1 A
1 A
1 A
2 A
2 B
2 B
;
run;
proc summary nway data = have classdata=ref;
class group1 group2;
output out = summary (drop = _TYPE_);
run;
N.B. I had to tweak the have
dataset slightly to make sure that group2 has length 1 in both datasets. If you use variables with the same name but different lengths in your classdata=
and data=
datasets, SAS will complain.