I have this dataset -
print(df)
object group
1 apple A
1 banana B
1 pear A
1 robot C
print(df2)
object group
1 apple A
1 apple B
1 apple A
1 robot C
1 robot C
1 robot C
1 banana C
I'd like to count the number of times each value shows up in another data set, which looks exactly the same but draws data from a different time period. The other dataset is called df2
.
I used this code -
x <- df %>%
mutate(reference = length(df2[df2$object == object,]$object))
It gave me this error: longer object length is not a multiple of shorter object length
.
It works when I run this outside of dplyr, like this -
object <- "apple"
length(df2[df2$object == object,]$object)
Can I do a relative reference like that, including object
in the filter?
From my comment: dplyr functions work on the whole column taken as a vector. Try
df %>%
rowwise() %>%
mutate(reference = length(df2[df2$object == object,]$object))%>%
ungroup()
As you said, ungroup
will be needed, unless you plan on doing further row-wise operations.
If you need a tidyverse
option, we can use map_dbl
purrr::map_dbl(df$object, ~ length(df2[df2$object == .,]$object))
#[1] 3 1 0 3
which can be also calculated with sum
purrr::map_dbl(df$object, ~ sum(df2$object == .))
So in mutate
we can add
df %>%
mutate(reference = map_dbl(object, ~ sum(df2$object == .)))
# object group reference
#1 apple A 3
#2 banana B 1
#3 pear A 0
#4 robot C 3
The similar base R option is sapply
sapply(df$object, function(x) sum(df2$object == x))
# apple banana pear robot
# 3 1 0 3
We can do this in data.table
library(data.table)
reference <- setDT(df2)[df, .N, on = .(object), by = .EACHI]$N
df$reference <- reference
df
# object group reference
#1: apple A 3
#2: banana B 1
#3: pear A 0
#4: robot C 3