This question was asked in stackoverflow.com/q/38378118 but there was no satisfactory answer.
LASSO with λ = 0 is equivalent to ordinary least squares, but this does not seem to be the case for glmnet()
and lm()
in R. Why?
library(glmnet)
options(scipen = 999)
X = model.matrix(mpg ~ 0 + ., data = mtcars)
y = as.matrix(mtcars["mpg"])
coef(glmnet(X, y, lambda = 0))
lm(y ~ X)
Their regression coefficients agree by at most 2 significant figures, perhaps due to slightly different termination conditions of their optimization algorithms:
glmnet lm
(Intercept) 12.19850081 12.30337
cyl -0.09882217 -0.11144
disp 0.01307841 0.01334
hp -0.02142912 -0.02148
drat 0.79812453 0.78711
wt -3.68926778 -3.71530
qsec 0.81769993 0.82104
vs 0.32109677 0.31776
am 2.51824708 2.52023
gear 0.66755681 0.65541
carb -0.21040602 -0.19942
The difference is much worse when we add interaction terms.
X = model.matrix(mpg ~ 0 + . + . * disp, data = mtcars)
y = as.matrix(mtcars["mpg"])
coef(glmnet(X, y, lambda = 0))
lm(y ~ X)
Regression coefficients:
glmnet lm
(Intercept) 36.2518682237 139.9814651
cyl -11.9551206007 -26.0246050
disp -0.2871942149 -0.9463428
hp -0.1974440651 -0.2620506
drat -4.0209186383 -10.2504428
wt 1.3612184380 5.4853015
qsec 2.3549189212 1.7690334
vs -25.7384282290 -47.5193122
am -31.2845893123 -47.4801206
gear 21.1818220135 27.3869365
carb 4.3160891408 7.3669904
cyl:disp 0.0980253873 0.1907523
disp:hp 0.0006066105 0.0006556
disp:drat 0.0040336452 0.0321768
disp:wt -0.0074546428 -0.0228644
disp:qsec -0.0077317305 -0.0023756
disp:vs 0.2033046078 0.3636240
disp:am 0.2474491353 0.3762699
disp:gear -0.1361486900 -0.1963693
disp:carb -0.0156863933 -0.0188304