I am doing multi-label classification where I am trying to predict correct tags to questions:
(X = questions, y = list of tags for each question from X).
I am wondering, which decision_function_shape
for sklearn.svm.SVC
should be be used with OneVsRestClassifier
?
From docs we can read that decision_function_shape
can have two values 'ovo'
and 'ovr'
:
decision_function_shape : ‘ovo’, ‘ovr’ or None, default=None
Whether to return a one-vs-rest (‘ovr’) decision function of shape (n_samples, n_classes) as all other classifiers, or the original one-vs-one (‘ovo’) decision function of libsvm which has shape (n_samples, n_classes * (n_classes - 1) / 2). The default of None will currently behave as ‘ovo’ for backward compatibility and raise a deprecation warning, but will change ‘ovr’ in 0.19.
But I still don't understand what is the difference between:
# First decision_function_shape set to 'ovo'
estim = OneVsRestClassifier(SVC(kernel='linear', decision_function_shape ='ovo'))
# Second decision_function_shape set to 'ovr'
estim = OneVsRestClassifier(SVC(kernel='linear', decision_function_shape ='ovr'))
Which decision_function_shape
should be used for multi-label classification problem?
EDIT: Question asking a similar thing with no answer.
The shape of the decision functions are different because
ovo
trains a classifier for each 2-pair class combination whereasovr
trains one classifier for each class fitted against all other classes.The best example I could find can be found here on http://scikit-learn.org:
What does this mean in simple terms?
To understand what
n_class * (n_class - 1) / 2
means, generate two-class combinations usingitertools.combinations
.Which estimator should be used for multi-label classification?
In your situation, you have a question with multiple tags (like here on StackOverflow). If you know your tags (classes) in-advance, I might suggest
OneVsRestClassifier(LinearSVC())
but you could try DecisionTreeClassifier or RandomForestClassifier (I think):Output:
I think the question of which should be used is best left up to a situational. That could easily be a part of your GridSearch. But just intuitively I would feel that as far as differences go you are going to be doing the same thing. Here is my reasoning:
OneVsRestClassifier
is designed to model each class against all of the other classes independently, and create a classifier for each situation. The way I understand this process is thatOneVsRestClassifier
grabs a class, and creates a binary label for whether a point is or isn't that class. Then this labelling gets fed into whatever estimator you have chosen to use. I believe the confusion comes in in thatSVC
also allows you to make this same choice, but in effect with this implementation the choice will not matter because you will always only be feeding two classes into theSVC
.And here is an example:
So you can see the coefficients are all equal for all three estimators built by the two models. Granted this dataset only has 150 samples and 3 classes so it is possible these results could be different for a more complex dataset, but it's a simple proof of concept.