I beieve SGDClassifier()
with loss='log'
supports Multilabel classification and I do not have to use OneVsRestClassifier. Check this
Now, my dataset is quite big and I am using HashingVectorizer
and passing result as input to SGDClassifier
. My target has 42048 features.
When I run this, as follows:
clf.partial_fit(X_train_batch, y)
I get: ValueError: bad input shape (300000, 42048)
.
I have also used classes as the parameter as follows, but still same problem.
clf.partial_fit(X_train_batch, y, classes=np.arange(42048))
In the documentation of SGDClassifier, it says y : numpy array of shape [n_samples]