Negative decision_function values

2019-04-02 04:30发布

I am using support vector classifier from sklearn on the Iris dataset. When I call decision_function it returns negative values. But all samples in test dataset after classification has right class. I think that decision_function should return the positive value when the sample is an inlier and negative if the sample is an outlier. Where I am wrong?

from sklearn import datasets
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split

iris = datasets.load_iris()
X = iris.data[:,:]
y = iris.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.3, 
random_state=0)

clf = SVC(probability=True)
print(clf.fit(X_train,y_train).decision_function(X_test))
print(clf.predict(X_test))
print(y_test)

Here is the output:

[[-0.76231668 -1.03439531 -1.40331645]
 [-1.18273287 -0.64851109  1.50296097]
 [ 1.10803774  1.05572833  0.12956269]
 [-0.47070432 -1.08920859 -1.4647051 ]
 [ 1.18767563  1.12670665  0.21993744]
 [-0.48277866 -0.98796232 -1.83186272]
 [ 1.25020033  1.13721691  0.15514536]
 [-1.07351583 -0.84997114  0.82303659]
 [-1.04709616 -0.85739411  0.64601611]
 [-1.23148923 -0.69072989  1.67459938]
 [-0.77524787 -1.00939817 -1.08441968]
 [-1.12212245 -0.82394879  1.11615504]
 [-1.14646662 -0.91238712  0.80454974]
 [-1.13632316 -0.8812114   0.80171542]
 [-1.14881866 -0.95169643  0.61906248]
 [ 1.15821271  1.10902205  0.22195304]
 [-1.19311709 -0.93149873  0.78649126]
 [-1.21653084 -0.90953622  0.78904491]
 [ 1.16829526  1.12102515  0.20604678]
 [ 1.18446364  1.1080255   0.15199149]
 [-0.93911991 -1.08150089 -0.8026332 ]
 [-1.15462733 -0.95603159  0.5713605 ]
 [ 0.93278883  0.99763184  0.34033663]
 [ 1.10999556  1.04596018  0.14791409]
 [-1.07285663 -1.01864255 -0.10701465]
 [ 1.21200422  1.01284263  0.0416991 ]
 [ 0.9462457   1.01076579  0.36620915]
 [-1.2108146  -0.79124775  1.43264808]
 [-1.02747495 -0.25741977  1.13056021]
...
 [ 1.16066886  1.11212424  0.22506538]]
 [2 1 0 2 0 2 0 1 1 1 2 1 1 1 1 0 1 1 0 0 2 1 0 0 2 0 0 1 1 0 2 1 0 2 2 1 0
 2 1 1 2 0 2 0 0]

 [2 1 0 2 0 2 0 1 1 1 2 1 1 1 1 0 1 1 0 0 2 1 0 0 2 0 0 1 1 0 2 1 0 2 2 1 0
 1 1 1 2 0 2 0 0]

2条回答
Luminary・发光体
2楼-- · 2019-04-02 05:05

Christopher is correct, but assuming OvR here.

Now you are doing the OvO scheme without noticing it!

Here some example, which:

  • explains how to predict using OvO + decision_function

But first OvO's theory on prediction from:

enter image description here

Code:

from sklearn import datasets
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split
import numpy as np

iris = datasets.load_iris()
X = iris.data[:,:]
y = iris.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.3,
random_state=0)

clf = SVC(decision_function_shape='ovo')  # EXPLICIT OVO-usage!
clf.fit(X, y)

def predict(dec):
    # OVO prediction-scheme
    # hardcoded for 3 classes!
    # OVO order assumption: 0 vs 1; 0 vs 2; 1 vs 2 (lexicographic!)
    # theory: http://www.stat.ucdavis.edu/~chohsieh/teaching/ECS289G_Fall2015/lecture9.pdf page 18
    #    and: http://www.mit.edu/~9.520/spring09/Classes/multiclass.pdf page 8
    class0 = dec[0] + dec[1]
    class1 = -dec[0] + dec[2]
    class2 = -dec[1] - dec[2]
    return np.argmax([class0, class1, class2])

dec_vals = clf.decision_function(X_test)
pred_vals = clf.predict(X_test)
pred_vals_own = np.array([predict(x) for x in dec_vals])

for i in range(len(X_test)):
    print('decision_function vals  : ', dec_vals[i])
    print('sklearns prediction     : ', pred_vals[i])
    print('own prediction using dec: ', pred_vals_own[i])

Output:

decision_function vals  :  [-0.76867027 -1.04536032 -1.60216452]
sklearns prediction     :  2
own prediction using dec:  2
decision_function vals  :  [-1.19939987 -0.64932285  1.6951256 ]
sklearns prediction     :  1
own prediction using dec:  1
decision_function vals  :  [ 1.11946664  1.05573131  0.06261988]
sklearns prediction     :  0
own prediction using dec:  0
decision_function vals  :  [-0.46107656 -1.09842529 -1.50671611]
sklearns prediction     :  2
own prediction using dec:  2
decision_function vals  :  [ 1.2094164   1.12827802  0.1415261 ]
sklearns prediction     :  0
own prediction using dec:  0
decision_function vals  :  [-0.47736819 -0.99988924 -2.15027278]
sklearns prediction     :  2
own prediction using dec:  2
decision_function vals  :  [ 1.25467104  1.13814461  0.07643985]
sklearns prediction     :  0
own prediction using dec:  0
decision_function vals  :  [-1.07557745 -0.87436887  0.93179222]
sklearns prediction     :  1
own prediction using dec:  1
decision_function vals  :  [-1.05047139 -0.88027404  0.80181305]
sklearns prediction     :  1
own prediction using dec:  1
decision_function vals  :  [-1.24310627 -0.70058067  1.906847  ]
sklearns prediction     :  1
own prediction using dec:  1
decision_function vals  :  [-0.78440125 -1.00630434 -0.99963088]
sklearns prediction     :  2
own prediction using dec:  2
decision_function vals  :  [-1.12586024 -0.84193093  1.25542752]
sklearns prediction     :  1
own prediction using dec:  1
decision_function vals  :  [-1.15639222 -0.91555677  1.07438865]
sklearns prediction     :  1
own prediction using dec:  1
decision_function vals  :  [-1.14345638 -0.90050709  0.95795276]
sklearns prediction     :  1
own prediction using dec:  1
decision_function vals  :  [-1.15790163 -0.95844647  0.83046875]
sklearns prediction     :  1
own prediction using dec:  1
decision_function vals  :  [ 1.17805731  1.11063472  0.1333462 ]
sklearns prediction     :  0
own prediction using dec:  0
decision_function vals  :  [-1.20283096 -0.93961585  0.98410451]
sklearns prediction     :  1
own prediction using dec:  1
decision_function vals  :  [-1.22782802 -0.90725712  1.05316513]
sklearns prediction     :  1
own prediction using dec:  1
decision_function vals  :  [ 1.16903803  1.12221984  0.11367107]
sklearns prediction     :  0
own prediction using dec:  0
decision_function vals  :  [ 1.17145967  1.10832227  0.08212776]
sklearns prediction     :  0
own prediction using dec:  0
decision_function vals  :  [-0.9506135  -1.08467062 -0.79851794]
sklearns prediction     :  2
own prediction using dec:  2
decision_function vals  :  [-1.16266048 -0.9573001   0.79179457]
sklearns prediction     :  1
own prediction using dec:  1
decision_function vals  :  [ 0.99991983  0.99976567  0.27258784]
sklearns prediction     :  0
own prediction using dec:  0
decision_function vals  :  [ 1.14009372  1.04646327  0.05173163]
sklearns prediction     :  0
own prediction using dec:  0
decision_function vals  :  [-1.08080806 -1.03404209 -0.06411027]
sklearns prediction     :  2
own prediction using dec:  2
decision_function vals  :  [ 1.23515997  1.01235174 -0.03884014]
sklearns prediction     :  0
own prediction using dec:  0
decision_function vals  :  [ 0.99958361  1.0123953   0.31647776]
sklearns prediction     :  0
own prediction using dec:  0
decision_function vals  :  [-1.21958703 -0.8018796   1.67844367]
sklearns prediction     :  1
own prediction using dec:  1
decision_function vals  :  [-1.03327108 -0.25946619  1.1567434 ]
sklearns prediction     :  1
own prediction using dec:  1
decision_function vals  :  [ 1.12368215  1.11169071  0.20956223]
sklearns prediction     :  0
own prediction using dec:  0
decision_function vals  :  [-0.82416303 -1.07792277 -1.1580516 ]
sklearns prediction     :  2
own prediction using dec:  2
decision_function vals  :  [-1.13071754 -0.96096255  0.65828256]
sklearns prediction     :  1
own prediction using dec:  1
decision_function vals  :  [ 1.194643    1.12966124  0.15746621]
sklearns prediction     :  0
own prediction using dec:  0
decision_function vals  :  [-1.04070512 -1.04532308 -0.20319486]
sklearns prediction     :  2
own prediction using dec:  2
decision_function vals  :  [-0.70170723 -1.09340841 -1.9323473 ]
sklearns prediction     :  2
own prediction using dec:  2
decision_function vals  :  [-1.24655214 -0.74489305  1.15450078]
sklearns prediction     :  1
own prediction using dec:  1
decision_function vals  :  [ 0.99984598  1.03781258  0.2790073 ]
sklearns prediction     :  0
own prediction using dec:  0
decision_function vals  :  [-0.99993896 -1.06846079 -0.44496083]
sklearns prediction     :  2
own prediction using dec:  2
decision_function vals  :  [-1.22495071 -0.83041964  1.41965874]
sklearns prediction     :  1
own prediction using dec:  1
decision_function vals  :  [-1.286798   -0.72689128  1.72244026]
sklearns prediction     :  1
own prediction using dec:  1
decision_function vals  :  [-0.75503345 -1.09561165 -1.44344022]
sklearns prediction     :  2
own prediction using dec:  2
decision_function vals  :  [ 1.24778268  1.11179415  0.05277115]
sklearns prediction     :  0
own prediction using dec:  0
decision_function vals  :  [-0.79577073 -1.00004599 -0.99974376]
sklearns prediction     :  2
own prediction using dec:  2
decision_function vals  :  [ 1.07018075  1.0831253   0.22181655]
sklearns prediction     :  0
own prediction using dec:  0
decision_function vals  :  [ 1.16705531  1.11326796  0.15604895]
sklearns prediction     :  0
own prediction using dec:  0
查看更多
疯言疯语
3楼-- · 2019-04-02 05:11

You need to consider the decision_function and the prediction separately. The decision is the distance from the hyperplane to your sample. This means by looking at the sign you can tell if your sample is located right or left to the hyperplane. So negative values are perfectly fine and indicate the negative class ("the other side of the hyperplane").

With the iris dataset you have a multi-class problem. As the SVM is binary classifier, there is no inherent multi-class classification. Two approaches are the "one-vs-rest" (OvR) and "one-vs-one" methods, which construct a multi-class classifier from the binary "units".

One-vs-one

Now that you already know OvR, OvA is not that much harder to grasp. You basically construct a classifier of every combination of class pairs (A, B). In your case: 0 vs 1, 0 vs 2, 1 vs 2.

Note: The values of (A, B) and (B, A) can be obtained from a single binary classifier. You only change what is considered the positive class and thus you have to invert the sign.

Doing this gives you a matrix:

+-------+------+-------+-------+
| A / B |  #0  |   #1  |   #2  |
+-------+------+-------+-------+
|       |      |       |       |
| #0    |  --  | -1.18 | -0.64 |
|       |      |       |       |
| #1    | 1.18 |  --   |  1.50 |
|       |      |       |       |
| #2    | 0.64 | -1.50 |  --   |
+-------+------+-------+-------+

Read this as following: Decision function value when class A (row) competes against class B (column).

In order to extract a result a vote is performed. In the basic form you can imagine this as a single vote that each classifier can give: Yes or No. This could lead to draws, so we use the whole decision function values instead.

+-------+------+-------+-------+-------+
| A / B |  #0  |   #1  |   #2  |  SUM  |
+-------+------+-------+-------+-------+
|       |      |       |       |       |
| #0    | -    | -1.18 | -0.64 | -1.82 |
|       |      |       |       |       |
| #1    | 1.18 | -     | 1.50  | 2.68  |
|       |      |       |       |       |
| #2    | 0.64 | -1.50 | -     | 0.86  |
+-------+------+-------+-------+-------+

The resulting columns gives you again a vector [-1.82, 2.68, 0.86]. Now apply arg max and it matches your prediction.

One-vs-rest

I keep this section to avoid further confusion. The scikit-lear SVC classifier (libsvm) has a decision_function_shape parameter, which deceived me into thinking it was OvR (i am using liblinear most of the time).

For a real OvR respone you get one value from the decision function per classifier, e.g.

 [-1.18273287 -0.64851109  1.50296097]

Now to obtain a prediction from this you could just apply arg max, which would return the last index with a value of 1.50296097. From here on the decision function's value is not needed anymore (for this single prediction). That's why you noticed that your predictions are fine.

However you also specified probability=True, which uses the value of the distance_function and passes it to a sigmoid function. Sample principle as above, but now you also have confidence values (i prefer this term over probabilities, since it only describes the distance to the hyperplane) between 0 and 1.

Edit: Oops, sascha is right. LibSVM uses one-vs-one (despite the shape of the decision function).

查看更多
登录 后发表回答