outlier detection based on gaussian mixture model

2019-09-09 13:31发布

问题:

I have a set of data. I want to build a one class distribution from that data. Based on the learned distribution I want to get a probability value for each of the data instance. Based on this probability values (thresholding) I want to build a classifier to classify a particular data instance is comming from that distribution or not.

In this case, lets say I have a data of 50x100000 where 50 is the dimension of each data instance, the number of instances are 100000. I am leaning a Gaussian mixture model based on this distribution.

When I try to get the probability values for instances I am getting very low values. So in this case how can I build a clssifier?

回答1:

I don't think this makes sense. For example, suppose your data is 1 dimensional, and suppose the truth is that it has been sampled from a bimodal distribution. But suppose you haven't worked out that it's from a bimodal distribution and you fit a normal distribution. You'd still have the best possible fit, but it would be the best possible fit to the wrong distribution, and the truth is that none of the points come from that distribution or from any distribution that looks like it.