how to create a column including the maximum value

2019-08-20 14:52发布

问题:

This question already has an answer here:

  • Calculate group mean (or other summary stats) and assign to original data 4 answers

Using R, I would like to create a new column (MaxAct) showing the maximum numbers of a different column (ActNo) while grouping by two factors (HHID and PERID)

For example, I have this data set:

UID HHID PERID ActNo
1   1000 1     1
2   1000 1     2
3   1000 1     3
4   1000 2     1
5   1000 2     2
6   2000 1     1
7   2000 1     2
8   2000 1     3
9   2000 1     4
10  2000 2     1
11  2000 2     2

Then I want to add the new column (MaxAct) as follows:

UID HHID PERID ActNo MaxAct
1   1000 1     1     3
2   1000 1     2     3
3   1000 1     3     3
4   1000 2     1     2
5   1000 2     2     2
6   2000 1     1     4
7   2000 1     2     4
8   2000 1     3     4
9   2000 1     4     4
10  2000 2     1     2
11  2000 2     2     2

回答1:

dat$MaxAct <- with(dat, ave(ActNo, HHID, PERID, FUN=max) )

For problems involving single vectors and grouping where you want the length of the result to equal the row count, ave is your function of choice. For more complicated problems, the lapply(split(dat, fac), FUN) approach may be needed or use do.call(rbind, by( ...))

If you have missing values:

dat$MaxAct <- with(dat, ave(ActNo, HHID, PERID, FUN=function(x) max(x, na.rm=TRUE) )  )


回答2:

This is standard fare for plyr with mutate or transform, base R ave or data.table (which might be considered a sledgehammer for a peanuts here).

The plyr and ave approaches has been addressed so

data.table

library(data.table)
DT <- data.table(DF)
DT[,MaxAct := max(ActNo), by = list(HHID, PERID)]

Given the size of the data the memory efficient and fast nature of data.table is perhaps not required.

having read your previous question How to Create a Column of Ranks While Grouping in R, so we know that max(ActNo) is simply the number of rows in each group then

DT[,MaxAct := .N, by = list(HHID, PERID)]

will work, and be marginally quicker.



回答3:

There are several approaches in R to do achieve this task. For me, the easiest way to do this is to use the plyr package

require(plyr)
ddply(dat, .(HHID, PERID), transform, MaxAct = max(ActNo))

   UID HHID PERID ActNo MaxAct
1    1 1000     1     1      3
2    2 1000     1     2      3
3    3 1000     1     3      3
4    4 1000     2     1      2
5    5 1000     2     2      2
6    6 2000     1     1      4
7    7 2000     1     2      4
8    8 2000     1     3      4
9    9 2000     1     4      4
10  10 2000     2     1      2
11  11 2000     2     2      2


回答4:

df <- read.table(textConnection("UID HHID PERID ActNo
1   1000 1     1
2   1000 1     2
3   1000 1     3
4   1000 2     1
5   1000 2     2
6   2000 1     1
7   2000 1     2
8   2000 1     3
9   2000 1     4
10  2000 2     1
11  2000 2     2"), header=T)


> ddply(df, .(HHID, PERID), transform, MaxAct = length(unique(ActNo)) )
   UID HHID PERID ActNo MaxAct
1    1 1000     1     1      3
2    2 1000     1     2      3
3    3 1000     1     3      3
4    4 1000     2     1      2
5    5 1000     2     2      2
6    6 2000     1     1      4
7    7 2000     1     2      4
8    8 2000     1     3      4
9    9 2000     1     4      4
10  10 2000     2     1      2
11  11 2000     2     2      2