“longer object length is not a multiple of shorter

2020-04-12 15:42发布

问题:

I have this dataset -

print(df)

  object    group   
1 apple      A    
1 banana     B    
1 pear       A    
1 robot      C

print(df2)

  object    group   
1 apple      A    
1 apple      B    
1 apple      A    
1 robot      C
1 robot      C
1 robot      C
1 banana     C

I'd like to count the number of times each value shows up in another data set, which looks exactly the same but draws data from a different time period. The other dataset is called df2.

I used this code -

x <- df %>%
  mutate(reference = length(df2[df2$object == object,]$object))

It gave me this error: longer object length is not a multiple of shorter object length.

It works when I run this outside of dplyr, like this -

object <- "apple"
length(df2[df2$object == object,]$object)

Can I do a relative reference like that, including object in the filter?

回答1:

From my comment: dplyr functions work on the whole column taken as a vector. Try

df %>%
rowwise() %>% 
mutate(reference = length(df2[df2$object == object,]$object))%>%
ungroup()

As you said, ungroup will be needed, unless you plan on doing further row-wise operations.



回答2:

If you need a tidyverse option, we can use map_dbl

purrr::map_dbl(df$object, ~ length(df2[df2$object == .,]$object))
#[1] 3 1 0 3

which can be also calculated with sum

purrr::map_dbl(df$object, ~ sum(df2$object == .))

So in mutate we can add

df %>%
  mutate(reference = map_dbl(object,  ~ sum(df2$object == .)))

#  object group reference
#1  apple     A         3
#2 banana     B         1
#3   pear     A         0
#4  robot     C         3

The similar base R option is sapply

sapply(df$object, function(x) sum(df2$object == x))

# apple banana   pear  robot 
#     3      1      0      3 


回答3:

We can do this in data.table

library(data.table)
reference <- setDT(df2)[df, .N, on = .(object), by = .EACHI]$N
df$reference <- reference
df
#   object group reference
#1:  apple     A         3
#2: banana     B         1
#3:   pear     A         0
#4:  robot     C         3


标签: r dplyr