矢量版本/矢量化一个用于在其中R等于环(Vector-version / Vectorizing a

我有值的矢量，把它叫做X，和一个数据帧，把它叫做dat.fram。我想运行类似的“grep”或“它”来找到dat.fram的所有索引[3]相互匹配X的元素的哪些

这是一个循环我有下面的效率非常低。请注意，有在X许多意见和“match.ind”可以有零个或多个匹配的每个成员。此外，dat.fram拥有超过100万的观察。有没有办法使用R中的载体功能，使这个过程更高效？

最后，我需要一个清单，因为我将列表传递给另一个函数，将来自dat.fram获取相应的值。

码：

match.ind=list()

for(i in 1:150000){
    match.ind[[i]]=which(dat.fram[,3]==X[i])
}

更新：

好了，哇，我刚刚发现这样做的真棒方式......它真的很光滑。想知道如果它在其他情况下有用...？！

### define v as a sample column of data - you should define v to be 
### the column in the data frame you mentioned (data.fram[,3]) 

v = sample(1:150000, 1500000, rep=TRUE)

### now here's the trick: concatenate the indices for each possible value of v,
### to form mybiglist - the rownames of mybiglist give you the possible values
### of v, and the values in mybiglist give you the index points

mybiglist = tapply(seq_along(v),v,c)

### now you just want the parts of this that intersect with X... again I'll
### generate a random X but use whatever X you need to

X = sample(1:200000, 150000)
mylist = mybiglist[which(names(mybiglist)%in%X)]

就是这样！作为检验，让我们来看看第一个3行MYLIST的：

> mylist[1:3]

$`1`
[1]  401143  494448  703954  757808 1364904 1485811

$`2`
[1]  230769  332970  389601  582724  804046  997184 1080412 1169588 1310105

$`4`
[1]  149021  282361  289661  456147  774672  944760  969734 1043875 1226377

有3处的缝隙，为3不会出现X（即使以V发生）。和反对4中列出的数字是在诉指数点，其中4出现：

> which(X==3)
integer(0)

> which(v==3)
[1]  102194  424873  468660  593570  713547  769309  786156  828021  870796  
883932 1036943 1246745 1381907 1437148

> which(v==4)
[1]  149021  282361  289661  456147  774672  944760  969734 1043875 1226377

最后，值得一提的是，出现在X，但不以V值不会在列表中的条目，但是这大概是你想要什么呢，因为他们是NULL！

额外注：您可以使用下面的代码来创建不以V X的每个成员的NA进入...

blanks = sort(setdiff(X,names(mylist)))
mylist_extras = rep(list(NA),length(blanks))
names(mylist_extras) = blanks
mylist_all = c(mylist,mylist_extras)
mylist_all = mylist_all[order(as.numeric(names(mylist_all)))]

相当不言自明：mylist_extras是所有你需要额外的列表功能的列表（名称是X的名字不为特色（MYLIST）在列表中的值，和实际的条目只是NA）。最后两行合并首先和MYLIST mylist_extras，然后进行重新排序，以便在mylist_all名称是按数字顺序。然后，这些名字应该在向量X完全匹配的（唯一的）值

干杯! :)

原帖BELOW ...通过上面所取代，很明显！

下面是与tapply玩具的例子，很可能显著更快运行......我做了X和d比较小，所以你可以看到发生了什么：

X = 3:7
n = 100
d = data.frame(a = sample(1:10,n,rep=TRUE), b = sample(1:10,n,rep=TRUE), 
               c = sample(1:10,n,rep=TRUE), stringsAsFactors = FALSE)

tapply(X,X,function(x) {which(d[,3]==x)})