Train multiple models in parallel with sklearn?

2019-03-21 05:05发布

I want to train multiple LinearSVC models with different random states but I prefer to do it in parallel. Is there an mechanism supporting this in sklearn? I know Gridsearch or some ensemble methods are doing in implicitly but what is the thing under the hood?

标签： machine-learning scikit-learn python-multiprocessing

1条回答

三岁会撩人

2楼-- · 2019-03-21 05:40

The "thing" under the hood is the library joblib, which powers for example the multi-processing in GridSearchCV and some ensemble methods. It's Parallel helper class is a very handy Swiss knife for embarrassingly parallel for loops.

This is an example to train multiple LinearSVC models with different random states in parallel with 4 processes using joblib:

from joblib import Parallel, delayed
from sklearn.svm import LinearSVC
import numpy as np

def train_model(X, y, seed):
    model = LinearSVC(random_state=seed)
    return model.fit(X, y)

X = np.array([[1,2,3],[4,5,6]])
y = np.array([0, 1])
result = Parallel(n_jobs=4)(delayed(train_model)(X, y, seed) for seed in range(10))
# result is a list of 10 models trained using different seeds

0人赞添加讨论(0) 举报

Train multiple models in parallel with sklearn?

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间