I have two vectors of values and one vector of weights, and I need to calculate the cosine similarity. For complicated reasons, I can only calculate the cosine for one pair at a time. But I have to do it many millions of times.
cosine_calc <- function(a,b,wts) {
#scale both vectors by the weights, then compute the cosine of the scaled vectors
a = a*wts
b = b*wts
(a %*% b)/(sqrt(a%*%a)*sqrt(b%*%b))
}
works, but I want to try to eke better performance out of it.
Example data:
a = c(-1.2092420, -0.7053822, 1.4364633, 1.3612304, -0.3029147, 1.0319704, 0.6707610, -2.2128987, -0.9839970, -0.4302205)
b = c(-0.69042619, 0.05811749, -0.17836802, 0.15699691, 0.78575477, 0.27925779, -0.08552864, -1.31031219, -1.92756861, -1.36350112)
w = c(0.26333839, 0.12803180, 0.62396023, 0.37393705, 0.13539926, 0.09199102, 0.37347546, 1.36790007, 0.64978409, 0.46256891)
> cosine_calc(a,b,w)[,1]
[1,] 0.8390671
This question points out that there are other predefined cosine functions available in R, but says nothing about their relative efficiency.
All the functions you're using are
.Primitive
(therefore already call compiled code directly), so it will be hard to find consistent speed gains outside of re-building R with an optimized BLAS. With that said, here is one option that might be faster for larger vectors:UPDATE:
Profiling reveals that quite a bit of time is spent multiplying each vector by the weight vector.
If you can do the weighting before you have to call the function millions of times, it could save you quite a bit of time.
cosine_calc3
is marginally faster than your original function with small vectors. Byte-compiling the function should give you another marginal speedup.