Its have to do with
parallelism implementation of XGBoost
I am trying to optimize XGBoost execution by giving it parameter nthread= 16 where my system has 24 cores. But when I train my model, it doesn't seem to even cross approx 20% of CPU utilization at any point in time while model training. Code snippet is as follows:-
param_30 <- list("objective" = "reg:linear", # linear
"subsample"= subsample_30,
"colsample_bytree" = colsample_bytree_30,
"max_depth" = max_depth_30, # maximum depth of tree
"min_child_weight" = min_child_weight_30,
"max_delta_step" = max_delta_step_30,
"eta" = eta_30, # step size shrinkage
"gamma" = gamma_30, # minimum loss reduction
"nthread" = nthreads_30, # number of threads to be used
"scale_pos_weight" = 1.0
)
model <- xgboost(data = training.matrix[,-5],
label = training.matrix[,5],
verbose = 1, nrounds=nrounds_30, params = param_30,
maximize = FALSE, early_stopping_rounds = searchGrid$early_stopping_rounds_30[x])
Please explain me (if possible) on how I can increase CPU utilization and speed up the model training for efficient execution. Code in R shall be helpful for further understanding.
Assumption:- It is about the execution in R package of XGBoost