I'm running grid search on random forests and trying to use n_jobs different than one but the kernel freezes, there is no CPU usage. With n_jobs=1 it works fine. I can't even stop the command with ctl-C and have to restart the kernel. I'm running on windows 7. I saw that there is a similar problem with OS X but the solution is not relevant for windows 7.
from sklearn.ensemble import RandomForestClassifier
rf_tfdidf = Pipeline([('vect',tfidf),
('clf', RandomForestClassifier(n_estimators=50,
class_weight='balanced_subsample'))])
param_grid = [{'vect__ngram_range':[(1,1)],
'vect__stop_words': [stop],
'vect__tokenizer':[tokenizer]
}]
if __name__ == '__main__':
gs_rf_tfidf = GridSearchCV(rf_tfdidf, param_grid, scoring='accuracy', cv=5,
verbose=10,
n_jobs=2)
gs_rf_tfidf.fit(X_train_part, y_train_part)
thanks.
The indent after
if __name__ == '__main__':
is not correct. If it's not the case and it's a copy paste mistake then you can try something like :So the first line of your script is
if __name__ == '__main__':
and then the rest code follows with the appropriate indent.New Code
This works fine for me (windows 8.1)
EDIT
The following works fine using PyCharm. I have not used spyder but it should also work for spyder:
Code