Define:
df1 <-data.frame(
id=c(rep(1,3),rep(2,3)),
v1=as.character(c("a","b","b",rep("c",3)))
)
s.t.
> df1
id v1
1 1 a
2 1 b
3 1 b
4 2 c
5 2 c
6 2 c
I want to create a third variable freq
that contains the most frequent observation in v1
by id
s.t.
> df2
id v1 freq
1 1 a b
2 1 b b
3 1 b b
4 2 c c
5 2 c c
6 2 c c
You can do this using ddply
and a custom function to pick out the most frequent value:
myFun <- function(x){
tbl <- table(x$v1)
x$freq <- rep(names(tbl)[which.max(tbl)],nrow(x))
x
}
ddply(df1,.(id),.fun=myFun)
Note that which.max
will return the first occurrence of the maximum value, in the case of ties. See ??which.is.max in the nnet
package for an option that breaks ties randomly.
mode <- function(x) names(table(x))[ which.max(table(x)) ]
df1$freq <- ave(df1$v1, df1$id, FUN=mode)
> df1
id v1 freq
1 1 a b
2 1 b b
3 1 b b
4 2 c c
5 2 c c
6 2 c c
Another way consists of using tidyverse
functions:
- grouping first, using
group_by()
, and counting the occurrence of the second variable using tally()
- arranging by the number of occurrences with
arrange()
- summarizing and picking out the first row with
summarize()
and first()
Therefore:
df1 %>%
group_by(id, v1) %>%
tally() %>%
arrange(id, desc(n)) %>%
summarize(freq = first(v1))
This will give you just the mapping (which I find cleaner):
# A tibble: 2 x 2
id freq
<dbl> <fctr>
1 1 b
2 2 c
You can then left_join
your original data frame with that table.