可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

The glmnet package uses a range of LASSO tuning parameters lambda scaled from the maximal lambda_max under which no predictors are selected. I want to find out how glmnet computes this lambda_max value. For example, in a trivial dataset:

set.seed(1)
library("glmnet")
x <- matrix(rnorm(100*20),100,20)
y <- rnorm(100)
fitGLM <- glmnet(x,y)
max(fitGLM$lambda)
# 0.1975946

The package vignette (http://www.jstatsoft.org/v33/i01/paper) describes in section 2.5 that it computes this value as follows:

sx <- as.matrix(scale(x))
sy <- as.vector(scale(y))
max(abs(colSums(sx*sy)))/100
# 0.1865232

Which clearly is close but not the same value. So, what causes this difference? And in a related question, how could I compute lambda_max for a logistic regression?

回答1:

To get the same result you need to standardize the variables using a standard deviation with n instead of n-1 denominator.

mysd <- function(y) sqrt(sum((y-mean(y))^2)/length(y))
sx <- scale(x,scale=apply(x, 2, mysd))
sx <- as.matrix(sx, ncol=20, nrow=100)
sy <- as.vector(scale(y, scale=mysd(y)))
max(abs(colSums(sx*sy)))/100
## [1] 0.1758808
fitGLM <- glmnet(sx,sy)
max(fitGLM$lambda)
## [1] 0.1758808

回答2:

It seems lambda_max for a logistic regression is calculated similarly, with weights based on class proportions:

set.seed(1)
library("glmnet")
x <- matrix(rnorm(100*20),100,20)
y <- rnorm(100)

mysd <- function(y) sqrt(sum((y-mean(y))^2)/length(y))
sx <- scale(x, scale=apply(x, 2, mysd))
sx <- as.matrix(sx, ncol=20, nrow=100)

y_bin <- factor(ifelse(y<0, -1, 1))
prop.table(table(y_bin)) 
# y_bin
#   -1    1 
# 0.62 0.38 
fitGLM_log <- glmnet(sx, y_bin, family = "binomial")
max(fitGLM_log$lambda)
# [1] 0.1214006
max(abs(colSums(sx*ifelse(y<0, -.38, .62))))/100
# [1] 0.1214006

回答3:

For your second question, look to Friedman et al's paper, "Regularization paths for generalized linear models via coordinate descent". In particular, see equation (10), which is equality at equilibrium. Just check under what conditions the numerator $S(\cdot,\cdot)$ is zero for all parameters.

回答4:

According to help("glmnet") the maximal lambda value is "the smallest value for which all coefficients are zero":

sum(fitGLM$beta[, which.max(fitGLM$lambda)])
#[1] 0
sum(glmnet(x,y, lambda=max(fitGLM$lambda)*0.999)$beta)
#[1] -0.0001809804

At a quick glance the value seems to be calculated by the Fortran code called by elnet.

How does glmnet compute the maximal lambda value?

问题:

回答1:

回答2:

回答3:

回答4:

收藏的人(0)

How does glmnet compute the maximal lambda value?

问题:

回答1:

回答2:

回答3:

回答4:

收藏的人(0)

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮