I guess this is trivial, I apologize, I couldn't find how to do it.
I am trying to abstain from a loop, so I am trying to vectorize the process:
I need to do something like grep
, but where the pattern
is a vector. Another option is a match
, where the value
is not only the first location.
For example data (which is not how the real data is, otherswise I would exploit it structure):
COUNTRIES=c("Austria","Belgium","Denmark","France","Germany",
"Ireland","Italy","Luxembourg","Netherlands",
"Portugal","Sweden","Spain","Finland","United Kingdom")
COUNTRIES_Target=rep(COUNTRIES,times=4066)
COUNTRIES_Origin=rep(COUNTRIES,each=4066)
Now, currently I got a loop that:
var_pointer=list()
for (i in 1:length(COUNTRIES_Origin))
{
var_pointer[[i]]=which(COUNTRIES_Origin[i]==COUNTRIES_Target)
}
The problem with match
is that match(x=COUNTRIES_Origin,table=COUNTRIES_Target)
returns a vector of the same length as COUNTRIES_Origin
and the value is the first match, while I need all of them.
The issue with grep
is that grep(pattern=COUNTRIES_Origin,x=COUNTRIES_Target)
is the given warning:
Warning message:
In grep(pattern = COUNTRIES_Origin, x = COUNTRIES_Target) :
argument 'pattern' has length > 1 and only the first element will be used
Any suggestions?
Trying to vectorize MxN matches is fundamentally not very performant, no matter how you do it it's still MN operations.
Use hashes instead for O(1) lookup.
For recommendations on using the
hash
package, see Can I use a list as a hash in R? If so, why is it so slow?It seems like you can just
lapply
over the list rather thanloop
.Here I use which because
grep
seems to be better for partial matches and you're looking for exact matches.