I would like to get the slope of a linear regression fit for 1M separate data sets (1M * 50 rows for data.frame, or 1M * 50 for array). Now I am using the lm()
function, which takes a very long time (about 10 min).
Is there any faster function for linear regression?
Yes there are:
R itself has lm.fit()
which is more bare-bones: no formula notation, much simpler result set
several of our Rcpp-related packages have fastLm()
implementations: RcppArmadillo, RcppEigen, RcppGSL.
We have described fastLm()
in a number of blog posts and presentations. If you want it in the fastest way, do not use the formula interface: parsing the formula and preparing the model matrix takes more time than the actual regression.
That said, if you are regressing a single vector on a single vector you can simplify this as no matrix package is needed.
Since 3.1.0 there is a .lm.fit()
function. This function should be faster than lm()
and lm.fit()
.
It's described and its performance is compared with different lm
functions here - https://rpubs.com/maechler/fast_lm.
speedlm
from speedglm
should do it as it works on large data sets.
lmfit in the package Rfast is even faster than .lm.fit.
The only drawback is that it does not work when the design matrix does not have full rank.