I need to speed up the calculation of the mean estimate of beta in a WLS in R - I was able to speed up the covariance calculation thanks to SO, and now I am wondering if there is another trick to also speed up the mean calculation (or if what I am doing is already efficient enough).
n = 10000
y = rnorm(n, 3, 0.4)
X = matrix(c(rnorm(n,1,2), sample(c(1,-1), n, replace = TRUE), rnorm(n,2,0.5)), nrow = n, ncol = 3)
Q = diag(rnorm(n, 1.5, 0.3))
wls.cov.matrix = crossprod(X / sqrt(diag(Q)))
Q.inv = diag(1/diag(Q))
wls.mean = wls.cov.matrix%*%t(X)%*%Q.inv%*%y
system.time(wls.cov.matrix%*%t(X)%*%Q.inv%*%y)
Is there another similar trick as in wls.cov.matrix crossprod to speed up the whole mean calculation, or no need? Thanks!
The main gain in the performance would be achieved by not having any n-by-n matrices along the way. I mean not having the Q matrix, only working with its diagonal.
Building on the answer by @Roland:
Qdiag = rnorm(n, 1.5, 0.3);
Q = diag(Qdiag);
wls.mean3 <- wls.cov.matrix %*% crossprod(X/Qdiag, y);
all.equal(wls.mean, wls.mean3)
microbenchmark(wls.cov.matrix %*% t(X) %*% Q.inv %*% y,
wls.cov.matrix %*% crossprod(X, Q.inv) %*% y,
wls.cov.matrix %*% crossprod(X/Qdiag, y),
times=5)
# wls.cov.matrix %*% t(X) %*% Q.inv %*% y 358050.195 363713.250 368820.818 372414.747 374824.56 5
# wls.cov.matrix %*% crossprod(X, Q.inv) %*% y 79449.856 81411.195 84616.706 85351.968 88108.62 5
# wls.cov.matrix %*% crossprod(X/Qdiag, y) 279.092 284.867 285.252 291.796 295.26 5
In the answer to your last question you were taught crossprod
. Use that function again:
n = 1e4
set.seed(42)
y = rnorm(n, 3, 0.4)
X = matrix(c(rnorm(n,1,2), sample(c(1,-1), n, replace = TRUE), rnorm(n,2,0.5)), nrow = n, ncol = 3)
Q = diag(rnorm(n, 1.5, 0.3))
wls.cov.matrix = crossprod(X / sqrt(diag(Q)))
Q.inv = diag(1/diag(Q))
wls.mean = wls.cov.matrix%*%t(X)%*%Q.inv%*%y
wls.mean2 <- wls.cov.matrix %*% crossprod(X, Q.inv) %*% y
all.equal(wls.mean, wls.mean2)
#[1] TRUE
library(microbenchmark)
microbenchmark(wls.cov.matrix %*% t(X) %*% Q.inv %*% y,
wls.cov.matrix %*% crossprod(X, Q.inv) %*% y,
times=5)
#Unit: milliseconds
# expr min lq median uq max neval
# wls.cov.matrix %*% t(X) %*% Q.inv %*% y 1019.3955 1022.1679 1022.2766 1024.540 1025.9131 5
#wls.cov.matrix %*% crossprod(X, Q.inv) %*% y 314.0622 315.3588 315.3933 317.024 317.1142 5
More performance improvements might be possible with some matrix algebra tricks, but that is not my forte.