I am running into the problem that the hyperparameters of my svm.SVC()
are too wide such that the GridSearchCV()
never gets completed! One idea is to use RandomizedSearchCV()
instead. But again, my dataset is relative big such that 500 iterations take about 1 hour!
My question is, what is a good set-up (in terms of the range of values for each hyperparameter) in GridSearchCV ( or RandomizedSearchCV ) in order to stop wasting resources?
In other words, how to decide whether or not e.g. C
values above 100 make sense and/or step of 1 is neither big not small? Any help is very much appreciated. This is the set-up am currently using:
parameters = {
'C': np.arange( 1, 100+1, 1 ).tolist(),
'kernel': ['linear', 'rbf'], # precomputed,'poly', 'sigmoid'
'degree': np.arange( 0, 100+0, 1 ).tolist(),
'gamma': np.arange( 0.0, 10.0+0.0, 0.1 ).tolist(),
'coef0': np.arange( 0.0, 10.0+0.0, 0.1 ).tolist(),
'shrinking': [True],
'probability': [False],
'tol': np.arange( 0.001, 0.01+0.001, 0.001 ).tolist(),
'cache_size': [2000],
'class_weight': [None],
'verbose': [False],
'max_iter': [-1],
'random_state': [None],
}
model = grid_search.RandomizedSearchCV( n_iter = 500,
estimator = svm.SVC(),
param_distributions = parameters,
n_jobs = 4,
iid = True,
refit = True,
cv = 5,
verbose = 1,
pre_dispatch = '2*n_jobs'
) # scoring = 'accuracy'
model.fit( train_X, train_Y )
print( model.best_estimator_ )
print( model.best_score_ )
print( model.best_params_ )