I'm attempting to calculate the decision_function of a SVC classifier MANUALLY (as opposed to using the inbuilt method) using the the python library SKLearn.
I've tried several methods, however, I can only ever get the manual calculation to match when I don't scale my data.
z
is a test datum (that's been scaled) and I think the other variables speak for themselves (also, I'm using an rbf kernel if thats not obvious from the code).
Here are the methods that I've tried:
1 Looping method:
dec_func = 0
for j in range(np.shape(sup_vecs)[0]):
norm2 = np.linalg.norm(sup_vecs[j, :] - z)**2
dec_func = dec_func + dual_coefs[0, j] * np.exp(-gamma*norm2)
dec_func += intercept
2 Vectorized Method
diff = sup_vecs - z
norm2 = np.sum(np.sqrt(diff*diff), 1)**2
dec_func = dual_coefs.dot(np.exp(-gamma_params*norm2)) + intercept
However, neither of these ever returns the same value as decision_function
. I think it may have something to do with rescaling my values or more likely its something silly that I've been over looking!
Any help would be appreciated.
So after a bit more digging and head scratching, I've figured it out.
As I mentioned above z
is a test datum that's been scaled. To scale it I had to extract .mean_
and .std_
attributes from the preprocessing.StandardScaler() object (after calling .fit() on my training data of course).
I was then using this scaled z
as an input to both my manual calculations and to the inbuilt function. However the inbuilt function was a part of a pipeline which already had StandardScaler as its first 'pipe' in the pipeline and as a result z
was getting scaled twice!
Hence, when I removed scaling from my pipeline, the manual answers "matched" the inbuilt function's answer.
I say "matched" in quotes by the way as I found I always had to flip the sign of my manual calculations to match the inbuilt version. Currently I have no idea why this is the case.
To conclude, I misunderstood how pipelines worked.
For those that are interested, here's the final versions of my manual methods:
diff = sup_vecs - z_scaled
# Looping Method
dec_func_loop = 0
for j in range(np.shape(sup_vecs)[0]):
norm2 = np.linalg.norm(diff[j,:])
dec_func_loop = dec_func_loop + dual_coefs[j] * np.exp(-gamma*(norm2**2))
dec_func_loop = -1 * (dec_func_loop - intercept)
# Vectorized method
norm2 = np.array([np.linalg.norm(diff[n, :]) for n in range(np.shape(sup_vecs)[0])])
dec_func_vec = -1 * (dual_coefs.dot(np.exp(-gamma*(norm2**2))) - intercept)
Addendum
For those who are interested in implementing a manual method for a multiclass SVC, the following link is helpful: https://stackoverflow.com/a/27752709/1182556