adabag boosting function throws error when giving

2019-05-06 19:53发布

I have a strange issue, whenever I try increasing the mfinal argument in boosting function of adabag package beyond 10 I get an error, Even with mfinal=9 I get warnings.

My train data has 7 class Dependant variable and 100 independant variables and around 22000 samples of data(Smoted one class using DMwR). My Dependant Variable is at the end of the training dataset in sequence.

library(adabag)
gc()
exp_recog_boo <- boosting(V1 ~ .,data=train_dataS,boos=TRUE,mfinal=9)

Error in 1:nrow(object$splits) : argument of length 0
In addition: Warning messages:
1: In acum + acum1 :
longer object length is not a multiple of shorter object length

Thanks in advance.

标签: r adaboost
6条回答
淡お忘
2楼-- · 2019-05-06 19:55

This worked for me:

modelADA <- boosting(lettr ~ ., data = trainAll, boos = TRUE, mfinal = 10, control = (minsplit = 0))

Essentially I just told rpart to require a minimum split length of zero to generate tree, it eliminated the error. I haven't tested this extensively so I can't guarantee it's a valid solution (what does a tree with a zero length leaf actually mean?), but it does prevent the error from being thrown.

查看更多
三岁会撩人
3楼-- · 2019-05-06 20:02

I also run into this same problem recently and this example R script solves it completely!

The main idea is that you need to set the control for rpart (which adabag uses for creating trees, see rpart.control) appropriately, so that at least a split is attempted in every tree.

I'm not totally sure but it appears that your "argument of length 0" may be the result of an empty tree, which can happen since there is a default setting of a "complexity" parameter that tells the function not to attempt a split if the decrease in homogeneity/lack of fit is below certain threshold.

查看更多
做个烂人
4楼-- · 2019-05-06 20:05

I think i Hit the problem.

ignore this -if you configure your control with a cp = 0, this wont happen. I think that if the first node of a tree make no improvement (or at least no better than the cp) the tree stay wiht 0 nodes so you have an empty tree and that make the algorithm fail.

EDIT: The problem is that the rpart generates trees with only one leaf(node) and the boosting metod use this sentence "k <- varImp(arboles[[m]], surrogates = FALSE, competes = FALSE)" being arboles[[m]] a tree with only one node it give you the eror.

To solve that you can modify the boosting metod:

Write: fix(boosting) and add the *'S lines.

if (boos == TRUE) { 
**   k <- 1
**   while (k == 1){
     boostrap <- sample(1:n, replace = TRUE, prob = pesos)
     fit <- rpart(formula, data = data[boostrap, -1],
         control = control)
**   k <- length(fit$frame$var)
**   }
     flearn <- predict(fit, newdata = data[, -1], type = "class")
     ind <- as.numeric(vardep != flearn)
     err <- sum(pesos * ind)
 }

this will prevent the algorith from acepting one leaf trees but you have to set the CP from the control param as 0 to avoid an endless loop..

查看更多
啃猪蹄的小仙女
5楼-- · 2019-05-06 20:05

use str() to see the attributes of your dataframe. For me, I just convert myclass variable as factor, then everything runs.

查看更多
SAY GOODBYE
6楼-- · 2019-05-06 20:10

Just ran into the same problem, and setting the complexity parameter to -1 or minimum split to 0 both work for me with rpart.control, e.g.

library(adabag)

r1 <- boosting(Y ~ ., data = data, boos = TRUE, 
               mfinal = 10,  control = rpart.control(cp = -1))

r2 <- boosting(Y ~ ., data = data, boos = TRUE, 
               mfinal = 10,  control = rpart.control(minsplit = 0))
查看更多
smile是对你的礼貌
7楼-- · 2019-05-06 20:12

My mistake was that I didn't set the TARGET as factor before.

Try this:

train$target <- as.factor(train$target)

and check by doing:

str(train$TARGET)
查看更多
登录 后发表回答