I have a set of images. I would like to learn a one class SVM (OC-SVM) to model the distribution of a particular class (positive) as I dont have enough examples to represent the other classes (negative). What I understood about OC-SVM is that it tries to separate the data from the origin or in other words it tries to learn a hyper sphere to fit the one class data.
My questions are,
If I want to use the output of the OC-SVM as a probability estimate, how can I do it?
What is the difference between the OC-SVM and any clustering algorithm (e.g. k-means)?
If you want a probability estimate, don't use a one-class SVM. This is not what they were designed for. You want something like kernel density estimation, which provides a non-parametric density estimate given some positive examples.
The difference between a one-class SVM and clustering is that in clustering, you're given points from several classes but you don't know which points correspond to which classes: this is the goal of inference (and you may also end up with density estimates for the classes and the marginal density over all of feature space too). The one-class SVM is given points only from one class, and expected to learn a separation between members of that class and anything else.
EDIT: Clustering is not the same as density estimation. Clustering is concerned with determining which instances belong to which classes (clusters), when the assignments are not given, and does not necessarily result in a similarity score between the supplied examples and any point in input space.
If the goal is to say, how similar is this new instance to the positive training examples I've seen, then what you do is fit a probability distribution to your training examples, then evaluate the density function at the new point. If this density falls below a threshold, you say the new point is outside of the class defined by the supplied examples.
You can build a parametric model of the class if you like, but this is usually tricky unless you either know something about the problem or are willing to take a standard distribution (multi-variate normal or Naive Bayes being the two obvious ones). So, the alternative is to use a non-parametric density estimate. This is the kernel density estimation I mentioned.