-->

Bayesian Optimisation applied in CatBoost

2019-05-29 01:25发布

问题:

This is my attempt at applying BayesSearch in CatBoost:

from catboost import CatBoostClassifier
from skopt import BayesSearchCV
from sklearn.model_selection import StratifiedKFold


# Classifier
bayes_cv_tuner = BayesSearchCV(
estimator = CatBoostClassifier(
silent=True
),
search_spaces = {
'depth':(2,16),
'l2_leaf_reg':(1, 500),
'bagging_temperature':(1e-9, 1000, 'log-uniform'),
'border_count':(1,255),
'rsm':(0.01, 1.0, 'uniform'),
'random_strength':(1e-9, 10, 'log-uniform'),
'scale_pos_weight':(0.01, 1.0, 'uniform'),
},
scoring = 'roc_auc',
cv = StratifiedKFold(
n_splits=2,
shuffle=True,
random_state=72
),
n_jobs = 1,
n_iter = 100,
verbose = 1,
refit = True,
random_state = 72
)

Keep track of results:

def status_print(optim_result):
"""Status callback durring bayesian hyperparameter search"""

# Get all the models tested so far in DataFrame format
all_models = pd.DataFrame(bayes_cv_tuner.cv_results_)    

# Get current parameters and the best parameters    
best_params = pd.Series(bayes_cv_tuner.best_params_)
print('Model #{}\nBest ROC-AUC: {}\nBest params: {}\n'.format(
    len(all_models),
    np.round(bayes_cv_tuner.best_score_, 4),
    bayes_cv_tuner.best_params_
))

Fit BayesCV

resultCAT = bayes_cv_tuner.fit(X_train, y_train, callback=status_print)

Results

The first 3 iterations work fine, but then I get a nonstop string of:

Iteration with suspicious time 7.55 sec ignored in overall statistics.

Iteration with suspicious time 739 sec ignored in overall statistics.

(...)

Any ideas of where I went wrong/How can I improve this?

Salut,

回答1:

One of the iterations in the set of experiments skopt is arranging is actually taking too long to complete, based on the timings that CatBoost has up so far recorded.

If you explore when this happens by setting the verbosity of the classifier and you use a callback to explore what combination of parameters skopt is exploring, you may find that the culprit is most likely the depth parameters: Skopt will slow down when CatBoost is trying to test deeper trees.

You can try to debug too using this custom callback:

counter = 0
def onstep(res):
    global counter
    args = res.x
    x0 = res.x_iters
    y0 = res.func_vals
    print('Last eval: ', x0[-1], 
          ' - Score ', y0[-1])
    print('Current iter: ', counter, 
          ' - Score ', res.fun, 
          ' - Args: ', args)
    joblib.dump((x0, y0), 'checkpoint.pkl')
    counter = counter+1

You can call it by:

resultCAT = bayes_cv_tuner.fit(X_train, y_train, callback=[onstep, status_print])

Actually I've noticed the same problem as yours in my experiments, the complexity raises in a non-linear way as the depth increases and thus CatBoost takes longer time to complete its iterations. A simple solution is to try searching a simpler space:

'depth':(2, 8)

Usually depth 8 is enough, anyway, you can first run skopt with maximum depth equal to 8 and then re-iterate by increasing the maximum.