I've been trying to use glmmLasso to do variable selection for a mixed-model but I can't seem to get the model to work. I've setup my model similarily to the demo found here. I'm using the simple method of using BIC to choose lambda.
This is the code I've been running.
library(glmmLasso)
lambda <- seq(500,0,by=-5)
family = binomial(link = logit)
library(MASS);library(nlme)
PQL<-glmmPQL(y~1,random = ~1|ID,family=family,data=train)
Delta.start<-c(as.numeric(PQL$coef$fixed),rep(0,64),as.numeric(t(PQL$coef$random$ID)))
Q.start<-as.numeric(VarCorr(PQL)[1,1])
BIC_vec<-rep(Inf,length(lambda))
for(j in 1:length(lambda)){
print(paste("Iteration ", j,sep=""))
glm1 = try(glmmLasso(y ~ variable1 + ... + as.factor(factorVariable1), rnd = list(ID=~1),
family = family, data = train, lambda=lambda[j],switch.NR=T,final.re=TRUE,
control=list(start=Delta.start, q_start=Q.start)),silent = TRUE)
if(class(glm1) != "try-error"){
BIC_vec[j]<-glm1$bic
}
}
One thing that I'm unsure of is the Delta.start. I was following the demo and I'm assuming the number of 0s that are repped is the number of variables, or if its a factor you add in 0s for 1 less than the number of levels in the factor.
Running this code all of the BIC scores were still Inf
. And, looking at glm1 I find this error
[1] "Error in if (group.sum[1] == 0 & sqrt(sum(score.beta[1:block[1]]^2)) > : \n missing value where TRUE/FALSE needed\n"
attr(,"class")
[1] "try-error"
attr(,"condition")
<simpleError in if (group.sum[1] == 0 & sqrt(sum(score.beta[1:block[1]]^2)) > lambda_vec[1]) { grad.lasso[1:block[1]] <- score.beta[1:block[1]] - lambda_vec[1] * (score.beta[1:block[1]]/sqrt(sum(score.beta[1:block[1]]^2)))} else { grad.lasso[1:block[1]] <- 0}: missing value where TRUE/FALSE needed>
Does anyone have any ideas on how to fix this? I'm not sure if this could cause an issue, but in the train data set, ID has around 7,500 levels.
Unfortunately, I can't include any data to make this reproducible. I'm hoping someone else has run across this issue in the past and knows what is going on. I'm trying to generate some data that also has this issue.
EDIT
It now looks like it has something to do with start=Delta.start
in control. When I remove that the model fits. I still am not sure what exactly about the Delta.start is making this break.
The issue with this was the levels of the factor. The length of the levels of
train$ID
was greater than the length of the unique values left intrain$ID
after sampling the data into train/test sets. I figured I would post my results in case anyone else runs into this issue.running
fixes the levels of factors for
ID
. Then, I ended up using the 3rd example in the demo that is linked in the question where it uses previous run results as the initialization of the parameters.