I am going to perform ShuffleSplit()
method for California housing dataset (Source: https://www.dcc.fc.up.pt/~ltorgo/Regression/cal_housing.html) to fit SGD regression.
However, the 'n_splits' error is occurred when method is applied.
The code is following:
from sklearn import cross_validation, grid_search, linear_model, metrics
import numpy as np
import pandas as pd
from sklearn.preprocessing import scale
from sklearn.cross_validation import ShuffleSplit
housing_data = pd.read_csv('cal_housing.csv', header = 0, sep = ',')
housing_data.fillna(housing_data.mean(), inplace=True)
df=pd.get_dummies(housing_data)
y_target = housing_data['median_house_value'].values
x_features = housing_data.drop(['median_house_value'], axis = 1)
from sklearn.cross_validation import train_test_split
from sklearn import model_selection
train_x, test_x, train_y, test_y = model_selection.train_test_split(x_features, y_target, test_size=0.2, random_state=4)
reg = linear_model.SGDRegressor(random_state=0)
cv = ShuffleSplit(n_splits = 10, test_size = 0.2, random_state = 0)
The error is below:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-22-8f8760b04f8c> in <module>()
----> 1 cv = ShuffleSplit(n_splits = 10, test_size = 0.2, random_state = 0)
TypeError: __init__() got an unexpected keyword argument 'n_splits'
I updated scikit-learn with 0.18 version.
Anaconda version: 4.5.8
Could you please advise on this issue?
You are mixing up two different modules.
Before 0.18, cross_validation was used for ShuffleSplit. In that,
n_splits
was not present.n
was used to define the number of splitsBut since you have updated to 0.18 now,
cross_validation
andgrid_search
has been deprecated in favor of model_selection.This is mentioned in docs here, and these modules will be removed from version 0.20
So instead of this:
Do this:
m sklearn.model_selection import train_test_split
Then you can use
n_splits
.