how to create a column including the maximum value

2019-08-20 14:22发布

This question already has an answer here:

Using R, I would like to create a new column (MaxAct) showing the maximum numbers of a different column (ActNo) while grouping by two factors (HHID and PERID)

For example, I have this data set:

UID HHID PERID ActNo
1   1000 1     1
2   1000 1     2
3   1000 1     3
4   1000 2     1
5   1000 2     2
6   2000 1     1
7   2000 1     2
8   2000 1     3
9   2000 1     4
10  2000 2     1
11  2000 2     2

Then I want to add the new column (MaxAct) as follows:

UID HHID PERID ActNo MaxAct
1   1000 1     1     3
2   1000 1     2     3
3   1000 1     3     3
4   1000 2     1     2
5   1000 2     2     2
6   2000 1     1     4
7   2000 1     2     4
8   2000 1     3     4
9   2000 1     4     4
10  2000 2     1     2
11  2000 2     2     2

4条回答
唯我独甜
2楼-- · 2019-08-20 14:53
dat$MaxAct <- with(dat, ave(ActNo, HHID, PERID, FUN=max) )

For problems involving single vectors and grouping where you want the length of the result to equal the row count, ave is your function of choice. For more complicated problems, the lapply(split(dat, fac), FUN) approach may be needed or use do.call(rbind, by( ...))

If you have missing values:

dat$MaxAct <- with(dat, ave(ActNo, HHID, PERID, FUN=function(x) max(x, na.rm=TRUE) )  )
查看更多
劳资没心,怎么记你
3楼-- · 2019-08-20 14:55
df <- read.table(textConnection("UID HHID PERID ActNo
1   1000 1     1
2   1000 1     2
3   1000 1     3
4   1000 2     1
5   1000 2     2
6   2000 1     1
7   2000 1     2
8   2000 1     3
9   2000 1     4
10  2000 2     1
11  2000 2     2"), header=T)


> ddply(df, .(HHID, PERID), transform, MaxAct = length(unique(ActNo)) )
   UID HHID PERID ActNo MaxAct
1    1 1000     1     1      3
2    2 1000     1     2      3
3    3 1000     1     3      3
4    4 1000     2     1      2
5    5 1000     2     2      2
6    6 2000     1     1      4
7    7 2000     1     2      4
8    8 2000     1     3      4
9    9 2000     1     4      4
10  10 2000     2     1      2
11  11 2000     2     2      2
查看更多
可以哭但决不认输i
4楼-- · 2019-08-20 15:11

This is standard fare for plyr with mutate or transform, base R ave or data.table (which might be considered a sledgehammer for a peanuts here).

The plyr and ave approaches has been addressed so

data.table

library(data.table)
DT <- data.table(DF)
DT[,MaxAct := max(ActNo), by = list(HHID, PERID)]

Given the size of the data the memory efficient and fast nature of data.table is perhaps not required.

having read your previous question How to Create a Column of Ranks While Grouping in R, so we know that max(ActNo) is simply the number of rows in each group then

DT[,MaxAct := .N, by = list(HHID, PERID)]

will work, and be marginally quicker.

查看更多
太酷不给撩
5楼-- · 2019-08-20 15:13

There are several approaches in R to do achieve this task. For me, the easiest way to do this is to use the plyr package

require(plyr)
ddply(dat, .(HHID, PERID), transform, MaxAct = max(ActNo))

   UID HHID PERID ActNo MaxAct
1    1 1000     1     1      3
2    2 1000     1     2      3
3    3 1000     1     3      3
4    4 1000     2     1      2
5    5 1000     2     2      2
6    6 2000     1     1      4
7    7 2000     1     2      4
8    8 2000     1     3      4
9    9 2000     1     4      4
10  10 2000     2     1      2
11  11 2000     2     2      2
查看更多
登录 后发表回答