Adding labels on curves in glmnet plot in R

2020-02-01 05:18发布

I am using glmnet package to get following graph from mtcars dataset (regression of mpg on other variables):

library(glmnet)
fit = glmnet(as.matrix(mtcars[-1]), mtcars[,1])
plot(fit, xvar='lambda')

enter image description here

How can I add names of variables to each curve, either at beginning of each curve or at its maximal y point (maximum away from x-axis)? I tried and I can add legend as usual but not labels on each curve or at its start. Thanks for your help.

标签: r plot glmnet
3条回答
Melony?
2楼-- · 2020-02-01 05:25

An alternative is the plot_glmnet function in the plotmo package. It automatically positions the variable names and has a few other bells and whistles. For example, the following code

library(glmnet)
mod <- glmnet(as.matrix(mtcars[-1]), mtcars[,1])
library(plotmo) # for plot_glmnet
plot_glmnet(mod)

gives

plot

The variable names are spread out to prevent overplotting, but we can still make out which curve is associated with which variable. Further examples may be found in Chapter 6 in plotres vignette which is included in the plotmo package.

查看更多
劳资没心,怎么记你
3楼-- · 2020-02-01 05:44

As the labels are hard coded it is perhaps easier to write a quick function. This is just a quick shot, so can be changed to be more thorough. I would also note that when using the lasso there are normally a lot of variables so there will be a lot of overlap of the labels (as seen in your small example)

lbs_fun <- function(fit, ...) {
        L <- length(fit$lambda)
        x <- log(fit$lambda[L])
        y <- fit$beta[, L]
        labs <- names(y)
        text(x, y, labels=labs, ...)
}

# plot
plot(fit, xvar="lambda")

# label
lbs_fun(fit)

enter image description here

查看更多
forever°为你锁心
4楼-- · 2020-02-01 05:47

Here is a modification of the best answer, using line segments instead of text labels directly overlying the curves. This is especially useful when there are lots of variables and you only want to print those that had absolute coefficient values greater than zero:

#note: the argument 'lra' is a cv.glmnet object


lbs_fun <- function(lra, ...) {

  fit <- lra$glmnet.fit

  L=which(fit$lambda==lra$lambda.min)

  ystart <- sort(fit$beta[abs(fit$beta[,L])>0,L])
  labs <- names(ystart)
  r <- range(fit$beta[,100]) # max gap between biggest and smallest coefs at smallest lambda i.e., 100th lambda
  yfin <- seq(r[1],r[2],length=length(ystart))

  xstart<- log(lra$lambda.min)
  xfin <- xstart+1


  text(xfin+0.3,yfin,labels=labs,...)
  segments(xstart,ystart,xfin,yfin)


}

plot(lra$glmnet.fit,label=F, xvar="lambda", xlim=c(-5.2,0), lwd=2) #xlim, lwd is optional
查看更多
登录 后发表回答