Summing vertically across rows under conditions (s

2019-03-04 06:37发布

County...AgeGrp...Population

A.............1..........200

A.............2..........100

A.............3..........100

A............All.........400

B.............1..........200

So, I have a list of counties and I'd like to find the under 18 population as a percent of the population for each county, so as an example from the table above I'd like to add only the population of agegrp 1 and 2 and divide by the 'all' population. In this case it would be 300/400. I'm wondering if this can be done for every county.

标签: sas
2条回答
萌系小妹纸
2楼-- · 2019-03-04 07:30

Let's call your SAS data set "HAVE" and say it has two character variables (County and AgeGrp) and one numeric variable (Population). And let's say you always have one observation in your data set for a each County with AgeGrp='All' on which the value of Population is the total for the county.

To be safe, let's sort the data set by County and process it in another data step to, creating a new data set named "WANT" with new variables for the county population (TOT_POP), the sum of the two Age Group values you want (TOT_GRP) and calculate the proportion (AgeGrpPct):

proc sort data=HAVE;
   by County;
run;
data WANT;
   retain TOT_POP TOT_GRP 0;
   set HAVE;
      by County;

   if first.County then do;
      TOT_POP = 0;
      TOT_GRP = 0;
      end;

   if AgeGrp in ('1','2') then TOT_GRP + Population;
   else if AgeGrp = 'All' then TOT_POP = Population;

   if last.County;
   AgeGrpPct = TOT_GRP / TOT_POP;

   keep County TOT_POP TOT_GRP AgeGrpPct;
   output;
run;

Notice that the observation containing AgeGrp='All' is not really needed; you could just as well have created another variable to collect a running total for all age groups.

查看更多
Evening l夕情丶
3楼-- · 2019-03-04 07:42

If you want a procedural approach, create a format for the under 18's, then use PROC FREQ to calculate the percentage. It is necessary to exclude the 'All' values from the dataset with this method (it's generally bad practice to include summary rows in the source data). PROC TABULATE could also be used for this.

data have;
input County $ AgeGrp $ Population;
datalines;
A 1 200
A 2 100
A 3 100
A All 400
B 1 200
B 2 300
B 3 500
B All 1000
;
run;

proc format;
value $age_fmt '1','2' = '<18'
                other   = '18+';
run;

proc sort data=have;
by county;
run;

proc freq data=have (where=(agegrp ne 'All')) noprint;
by county;
table agegrp / out=want (drop=COUNT where=(agegrp in ('1','2')));
format agegrp $age_fmt.;
weight population;
run;
查看更多
登录 后发表回答