I'm reading Andrew NG's Machine Learning notes, but the functional margin definition confused me :
I can understand to geometric margin is the distance from x to its hyperplane, but how to understand functional margin ? And why they define its formula like that ?
Think of it like this: w^T.x_i +b is the model's prediction for the i-th data point. Y_i is its label. If the prediction and ground truth have the same sign, then gamma_i will be positive. The further "inside" the class boundary this instance is, the bigger gamma_i will be : this is better because, summed over all i, you will have greater separation between your classes. If the prediction and the label don't agree in sign, then this quantity will be negative (incorrect decision by the predictor), which will reduce your margin, and it will be reduced more the more incorrect you are (analogous to slack variables).
functional margin is used to scale.
geometric margin = functional margin / norm(w).
Or, when norm(w) = 1 then the margin is geometric margin
You can transfer functional margin to geometric margin based on the following two hypothesis:
||w|| == 1, therefore (w^T)x+b == ((w^T)x+b)/||w||, which is the geometry distance from point x to the line y=(w^T)x+b.
There are only two categories for targets, where y_i can only be +1 and -1. Therefore, if the sign of y_i matches the side of the line where the point x lies in (y_i > 0 when (w^T)x+b > 0, y_i < 0 when (w^T)x+b < 0), multiplying y_i is simply equivalent to getting the absolute value of the distance (w^T)x+b.
Explanation: Functional margin doesn't tell us about the exact distance or measurement of different points to the separating plane/line.
For instance, just consider following lines they are same but functional margin would vary (a limitation of functional margin).
Functional Margin just give an idea about the confidence of our classification, nothing concrete.
Please also read below reference for more details.
Functional Margin:
This gives the position of the point with respect to the plane, which does not depend on the magnitude.
Geometric Margin:
This gives the distance between the given training example and the given plane.