“Error in drop(y %*% rep(1, nc))” error for cv.glm

2019-07-05 08:22发布

问题:

I have a function to return the auc value for a cv.glmnet model and it often, although not the majority of the time, returns the following error when executing the cv.glmnet function:

Error in drop(y %% rep(1, nc)) : error in evaluating the argument 'x' in selecting a method for function 'drop': Error in y %% rep(1, nc) : non-conformable arguments

I've read a little bit about the error and the only suggestion I could find was to use data.matrix() instead of as.matrix(). My function is as follows (where "form" is a formula with my desired variables and "dt" is the data frame):

auc_cvnet <- function(form, dt, standard = F){
      vars = all.vars(form)
      depM = dt[[vars[1]]]
      indM = data.matrix(dt[vars[-1]])
      model = cv.glmnet(indM, depM, family = "binomial", nfolds=3, type.measure="auc", standardize = standard)

      pred = predict(model, indM, type = "response")
      tmp = prediction(pred, depM)
      auc.tmp = performance(tmp, "auc")
      return(as.numeric(auc.tmp@y.values))
    }

I'm implementing this function in another function that iterates through combinations of a few variables to see what combinations of variables work well (it's a pretty brute-force method). Anyway, I printed out the formula for the iteration when the error was thrown and called the function with just that formula and it worked fine. So unfortunately I can't pinpoint what calls throw an error, otherwise I'd try to give more information. The data frame has about 30 rows and there are no errors when I run my code on a larger data set with 110 rows. There also are no NAs in either data set.

Has anyone seen this before or have any thoughts? Thanks!

回答1:

Believe it or not, I actually got this same error today. Since I don't know your dataset, I can't say for sure what it is, but for me, the data I was passing as my y variable (your depM) was a column of all True values. cv.glmnet would only return a valid model if my y variable contained True and False values.

I wish I could explain why cv.glmnet required both True and False, but I have a lack of understanding of the function itself (as it is, I am only adapting code given to me). I just thought I'd post this in case it would give you some help troubleshooting. Good luck!



回答2:

I have the same problem when running cv.glmnet on a dataset with 2 positive cases and 850 negative ones. In one of the cross-validation iterations (where the training and testing sets are randomly sampled) both positive cases are sampled-out of the training set. Thus, glmnet calls lognet, which in turn calls drop(y %*% rep(1, nc)) but y is a vector and not a matrix with at least two columns.

The easiest way I can think of is to specify the foldid parameter to cv.glmnet and make sure that there are at least two classes present in the data in every iteration.



标签: r glmnet