I'm playing with a one-vs-all Logistic Regression classifier using Scikit-Learn (sklearn). I have a large dataset that is too slow to run all at one go; also I would like to study the learning curve as the training proceeds.
I would like to use batch gradient descent to train my classifier in batches of, say, 500 samples. Is there some way of using sklearn to do this, or should I abandon sklearn and "roll my own"?
This is what I have so far:
from sklearn.linear_model import LogisticRegression
from sklearn.multiclass import OneVsRestClassifier
# xs are subsets of my training data, ys are ground truth for same; I have more
# data available for further training and cross-validation:
xs.shape, ys.shape
# => ((500, 784), (500))
lr = OneVsRestClassifier(LogisticRegression())
lr.fit(xs, ys)
lr.predict(xs[0,:])
# => [ 1.]
ys[0]
# => 1.0
I.e. it correctly identifies a training sample (yes, I realize it would be better to evaluate it with new data -- this is just a quick smoke-test).
R.e. batch gradient descent: I haven't gotten as far as creating learning curves, but can one simply run fit
repeatedly on subsequent subsets of the training data? Or is there some other function to train in batches? The documentation and Google are fairly silent on the matter. Thanks!