Tuning xgboost with xgb.train providing a validati

2019-07-01 15:13发布

Related questions here and here. The common way of tuning xgboost (i.e. nrounds) is using xgb.cv that performs k-fold cross validation, for example:

require(xgboost)
data(iris)
set.seed(1)
index = sample(1:150)
X = as.matrix(iris[index, 1:4])
y = as.matrix(as.numeric(iris[index, "Species"])) - 1
param = list(eta=0.1, objective="multi:softprob")
xgb.cv(params=param, data=X, nrounds=50, nfold=5, label=y, num_class=3)
> train.merror.mean train.merror.std test.merror.mean test.merror.std
> 1:          0.021667         0.009501         0.040000        0.043461
> 2:          0.018333         0.006972         0.033333        0.047141
> 3:          0.018333         0.006972         0.033333        0.047141
> 4:          0.018333         0.006972         0.033333        0.047141

Anyway, I want to tune xgboost providing a validation set. This is not possible using xgb.cv. It seems that this can be achieved using xgb.train:

require(xgboost)
data(iris)
set.seed(1)
index = sample(1:150)
indexTrain = index[1:100]
indexValid = index[101:150]
Xtrain = as.matrix(iris[indexTrain, 1:4])
Xvalid = as.matrix(iris[indexValid, 1:4])
yTrain = as.numeric(iris[indexTrain, "Species"]) - 1
yValid = as.numeric(iris[indexValid, "Species"]) - 1
train = xgb.DMatrix(Xtrain, label=yTrain)
valid = xgb.DMatrix(Xvalid, label=yValid)
param = list(eta=0.1, objective="multi:softprob")
watchlist = list(eval=valid, train=train)
model = xgb.train(params=param, data=train, nround=40, watchlist=watchlist,
                  num_class=3)
>[0]    eval-merror:0.060000    train-merror:0.020000
>[1]    eval-merror:0.060000    train-merror:0.030000
>[2]    eval-merror:0.060000    train-merror:0.020000
>[3]    eval-merror:0.060000    train-merror:0.020000

In fact, while training using xgb.traing it's possible to observe the evaluation error printed in the console. Anyway, it seems that this information is lost since the only attributes of model are handle and raw.

QUESTION 1: How to retrieve the vector of the validation error printed in the console?

QUESTION 2: How to retrieve the vector of the standard error of the individual validation errors such as the once produced by xgb.cv?

EDIT1: In lines 58 and 59 here it seems that the author is able to extract the validation error. Anyway, I'm not able to adapt to do the same with the iris dataset.

EDIT2: Another (unanswered) strictly related question here

0条回答
登录 后发表回答