Faster %in% operator

2020-05-30 04:11发布

The fastmatch package implements a much faster version of match for repeated matches (e.g. in a loop):

set.seed(1)
library(fastmatch)
table <- 1L:100000L
x <- sample(table, 10000, replace=TRUE)
system.time(for(i in 1:100) a <-  match(x, table))
system.time(for(i in 1:100) b <- fmatch(x, table))
identical(a, b)

Is there a similar implementation for %in% I could use to speed up repeated lookups?

标签： r match

2条回答

小情绪 Triste *

2楼-- · 2020-05-30 04:21

match is almost always better done by putting both vectors in dataframes and merging (see various joins from dplyr)

For example, something like this would give you all the info you need:

library(dplyr)

data = data_frame(data.ID = 1L:100000L,
                  data.extra = 1:2)

sample = 
  data %>% 
  sample_n(10000, replace=TRUE) %>%
  mutate(sample.ID = 1:n(),
         sample.extra = 3:4 )

# join table not strictly necessary in this case
# but necessary in many-to-many matches
data__sample = inner_join(data, sample)

#check whether a data.ID made it into sample
data__sample %>% filter(data.ID == 1)

or left_join, right_join, full_join, semi_join, anti_join, depending on what info is most useful to you

0人赞添加讨论(0) 举报

爷、活的狠高调

3楼-- · 2020-05-30 04:38

Look at the definition of %in%:

R> `%in%`
function (x, table) 
match(x, table, nomatch = 0L) > 0L
<bytecode: 0x1fab7a8>
<environment: namespace:base>

It's easy to write your own %fin% function:

`%fin%` <- function(x, table) {
  stopifnot(require(fastmatch))
  fmatch(x, table, nomatch = 0L) > 0L
}
system.time(for(i in 1:100) a <- x %in% table)
#    user  system elapsed 
#   1.780   0.000   1.782 
system.time(for(i in 1:100) b <- x %fin% table)
#    user  system elapsed 
#   0.052   0.000   0.054
identical(a, b)
# [1] TRUE

0人赞添加讨论(0) 举报

Faster %in% operator

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间