Currently I have a problem as follows. In a dataset where multiple observations for each subject exist, and I want to make a subset of this dataset where only the maximum data for a record is selected. For example, for a data set as below:
ID <- c(1,1,1,2,2,2,2,3,3)
Value <- c(2,3,5,2,5,8,17,3,5)
Event <- c(1,1,2,1,2,1,2,2,2)
group <- data.frame(Subject=ID, pt=Value, Event=Event)
Subject 1, 2 and 3 have the biggest pt value of 5, 17 and 5 respectively. How could I first, find the biggest pt value for each subject, and then, put this observation in another data frame? This means that this subset would only have the biggest pt values for each subject.
Using Base
R
Here's a
data.table
solution:If you want to keep all the entries corresponding to max values of
pt
within each group:If you'd like just the first max value of
pt
:In this case, it doesn't make a difference, as there aren't multiple maximum values within any group in your data.
If you want the biggest pt value for a subject, you could simply use:
A
dplyr
solution:This yields the following data frame:
A shorter solution using
data.table
:The most intuitive method is to use group_by and top_n function in dplyr
The result you get is