SKLearn: Getting distance of each point from decis

2019-03-05 23:01发布

I am using SKLearn to run SVC on my data.

from sklearn import svm

svc = svm.SVC(kernel='linear', C=C).fit(X, y)

I want to know how I can get the distance of each data point in X from the decision boundary?

2条回答
甜甜的少女心
2楼-- · 2019-03-05 23:17

It happens to be that I am doing the homework 1 of a course named Machine Learning Techniques. And there happens to be a problem about point's distance to hyperplane even for RBF kernel.

First we know that SVM is to find an "optimal" w for a hyperplane wx + b = 0.

And the fact is that

w = \sum_{i} \alpha_i \phi(x_i)

where those x are so called support vectors and those alpha are coefficient of them. Note that there is a phi() outside the x; it is the transform function that transform x to some high dimension space (for RBF, it is infinite dimension). And we know that

[\phi(x_1)\phi(x_2) = K(x_1, x_2)][2]

so we can compute

then we can get w. So, the distance you want should be

svc.decision_function(x) / w_norm

where w_norm the the norm calculated above.

(StackOverflow doesn't allow me post more than 2 links so render the latex yourself bah.)

查看更多
你好瞎i
3楼-- · 2019-03-05 23:23

For linear kernel, the decision boundary is y = w * x + b, the distance from point x to the decision boundary is y/||w||.

y = svc.decision_function(x)
w_norm = np.linalg.norm(svc.coef_)
dist = y / w_norm

For non-linear kernels, there is no way to get the absolute distance. But you can still use the result of decision_funcion as relative distance.

查看更多
登录 后发表回答