This question already has an answer here:
Using R, I would like to create a new column (MaxAct) showing the maximum numbers of a different column (ActNo) while grouping by two factors (HHID and PERID)
For example, I have this data set:
UID HHID PERID ActNo
1 1000 1 1
2 1000 1 2
3 1000 1 3
4 1000 2 1
5 1000 2 2
6 2000 1 1
7 2000 1 2
8 2000 1 3
9 2000 1 4
10 2000 2 1
11 2000 2 2
Then I want to add the new column (MaxAct) as follows:
UID HHID PERID ActNo MaxAct
1 1000 1 1 3
2 1000 1 2 3
3 1000 1 3 3
4 1000 2 1 2
5 1000 2 2 2
6 2000 1 1 4
7 2000 1 2 4
8 2000 1 3 4
9 2000 1 4 4
10 2000 2 1 2
11 2000 2 2 2
For problems involving single vectors and grouping where you want the length of the result to equal the row count,
ave
is your function of choice. For more complicated problems, thelapply(split(dat, fac), FUN)
approach may be needed or usedo.call(rbind, by( ...))
If you have missing values:
This is standard fare for
plyr
withmutate
ortransform
, base Rave
ordata.table
(which might be considered a sledgehammer for a peanuts here).The
plyr
andave
approaches has been addressed sodata.table
Given the size of the data the memory efficient and fast nature of
data.table
is perhaps not required.having read your previous question How to Create a Column of Ranks While Grouping in R, so we know that
max(ActNo)
is simply the number of rows in each group thenwill work, and be marginally quicker.
There are several approaches in R to do achieve this task. For me, the easiest way to do this is to use the
plyr
package