How to select the row with the maximum value in ea

2018-12-31 06:08发布

Currently I have a problem as follows. In a dataset where multiple observations for each subject exist, and I want to make a subset of this dataset where only the maximum data for a record is selected. For example, for a data set as below:

ID <- c(1,1,1,2,2,2,2,3,3)
Value <- c(2,3,5,2,5,8,17,3,5)
Event <- c(1,1,2,1,2,1,2,2,2)

group <- data.frame(Subject=ID, pt=Value, Event=Event)

Subject 1, 2 and 3 have the biggest pt value of 5, 17 and 5 respectively. How could I first, find the biggest pt value for each subject, and then, put this observation in another data frame? This means that this subset would only have the biggest pt values for each subject.

标签: r
8条回答
怪性笑人.
2楼-- · 2018-12-31 06:17
do.call(rbind, lapply(split(group,as.factor(group$Subject)), function(x) {return(x[which.max(x$pt),])}))

Using Base R

查看更多
春风洒进眼中
3楼-- · 2018-12-31 06:19

Here's a data.table solution:

require(data.table) ## 1.9.2
group <- as.data.table(group)

If you want to keep all the entries corresponding to max values of pt within each group:

group[group[, .I[pt == max(pt)], by=Subject]$V1]
#    Subject pt Event
# 1:       1  5     2
# 2:       2 17     2
# 3:       3  5     2

If you'd like just the first max value of pt:

group[group[, .I[which.max(pt)], by=Subject]$V1]
#    Subject pt Event
# 1:       1  5     2
# 2:       2 17     2
# 3:       3  5     2

In this case, it doesn't make a difference, as there aren't multiple maximum values within any group in your data.

查看更多
人气声优
4楼-- · 2018-12-31 06:19

If you want the biggest pt value for a subject, you could simply use:

   pt_max = as.data.frame(aggregate(pt~Subject, group, max))
查看更多
不流泪的眼
5楼-- · 2018-12-31 06:24

A dplyr solution:

library(dplyr)
ID <- c(1,1,1,2,2,2,2,3,3)
Value <- c(2,3,5,2,5,8,17,3,5)
Event <- c(1,1,2,1,2,1,2,2,2)
group <- data.frame(Subject=ID, pt=Value, Event=Event)

group %>%
    group_by(Subject) %>%
    summarize(max.pt = max(pt))

This yields the following data frame:

  Subject max.pt
1       1      5
2       2     17
3       3      5
查看更多
高级女魔头
6楼-- · 2018-12-31 06:27

A shorter solution using data.table:

setDT(group)[, .SD[which.max(pt)], by=Subject]
#    Subject pt Event
# 1:       1  5     2
# 2:       2 17     2
# 3:       3  5     2
查看更多
深知你不懂我心
7楼-- · 2018-12-31 06:29

The most intuitive method is to use group_by and top_n function in dplyr

    group %>% group_by(Subject) %>% top_n(1, pt)

The result you get is

    Source: local data frame [3 x 3]
    Groups: Subject [3]

      Subject    pt Event
        (dbl) (dbl) (dbl)
    1       1     5     2
    2       2    17     2
    3       3     5     2
查看更多
登录 后发表回答