Parallelism in XGBoost machine learning technique

Its have to do with

parallelism implementation of XGBoost

I am trying to optimize XGBoost execution by giving it parameter nthread= 16 where my system has 24 cores. But when I train my model, it doesn't seem to even cross approx 20% of CPU utilization at any point in time while model training. Code snippet is as follows:-

param_30 <- list("objective" = "reg:linear",    # linear 
               "subsample"= subsample_30,
               "colsample_bytree" = colsample_bytree_30,
               "max_depth" = max_depth_30,    # maximum depth of tree 
               "min_child_weight" = min_child_weight_30,
               "max_delta_step" = max_delta_step_30,
               "eta" = eta_30,    # step size shrinkage 
               "gamma" = gamma_30,  # minimum loss reduction 
               "nthread" = nthreads_30,   # number of threads to be used
               "scale_pos_weight" = 1.0
)
model <- xgboost(data = training.matrix[,-5], 
              label = training.matrix[,5], 
              verbose = 1, nrounds=nrounds_30, params = param_30, 
              maximize = FALSE, early_stopping_rounds = searchGrid$early_stopping_rounds_30[x])

Please explain me (if possible) on how I can increase CPU utilization and speed up the model training for efficient execution. Code in R shall be helpful for further understanding.

Assumption:- It is about the execution in R package of XGBoost

标签： r machine-learning parallel-processing xgboost

1条回答

放我归山

2楼-- · 2019-09-15 04:00

This is a guess... but I have had this happen to me ...

You are spending to much time communicating during the parallelism and are not ever getting CPU bound. https://en.wikipedia.org/wiki/CPU-bound

Bottom line is your data isn't large enough (rows and columns ), and/or your trees aren't deep enough max_depth to warrant that many cores. Too much overhead. xgboost parallelizes split evaluations so deep trees on big data can keep the CPU humming at max.

I have trained many models where single threaded outperforms 8/16 cores. Too much time switching and not enough work.

**MORE DATA, DEEPER TREES OR LESS CORES :) **

0人赞添加讨论(0) 举报

Parallelism in XGBoost machine learning technique

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间