Its have to do with
parallelism implementation of XGBoost
I am trying to optimize XGBoost execution by giving it parameter nthread= 16 where my system has 24 cores. But when I train my model, it doesn't seem to even cross approx 20% of CPU utilization at any point in time while model training. Code snippet is as follows:-
param_30 <- list("objective" = "reg:linear", # linear
"subsample"= subsample_30,
"colsample_bytree" = colsample_bytree_30,
"max_depth" = max_depth_30, # maximum depth of tree
"min_child_weight" = min_child_weight_30,
"max_delta_step" = max_delta_step_30,
"eta" = eta_30, # step size shrinkage
"gamma" = gamma_30, # minimum loss reduction
"nthread" = nthreads_30, # number of threads to be used
"scale_pos_weight" = 1.0
)
model <- xgboost(data = training.matrix[,-5],
label = training.matrix[,5],
verbose = 1, nrounds=nrounds_30, params = param_30,
maximize = FALSE, early_stopping_rounds = searchGrid$early_stopping_rounds_30[x])
Please explain me (if possible) on how I can increase CPU utilization and speed up the model training for efficient execution. Code in R shall be helpful for further understanding.
Assumption:- It is about the execution in R package of XGBoost
This is a guess... but I have had this happen to me ...
You are spending to much time communicating during the parallelism and are not ever getting CPU bound. https://en.wikipedia.org/wiki/CPU-bound
Bottom line is your data isn't large enough (rows and columns ), and/or your trees aren't deep enough
max_depth
to warrant that many cores. Too much overhead.xgboost
parallelizes split evaluations so deep trees on big data can keep the CPU humming at max.I have trained many models where single threaded outperforms 8/16 cores. Too much time switching and not enough work.
**MORE DATA, DEEPER TREES OR LESS CORES :) **