How does the stats.gaussian_kde method calcute the

2019-08-15 02:43发布

I am using the scipy.stats.gaussian_kde method from scipy to generate random samples from the data.

It works fine! What I have now found out is that the method also has inbuilt functions to calculate the probability density function of the given set of points (my data).

I would like to know how it calculates the pdf provided a set of points.

Here is small example:

import numpy as np
import scipy.stats
from scipy import stats

def getDistribution1(data):
    kernel = stats.gaussian_kde(data,bw_method=0.06)
    class rv(stats.rv_continuous):
        def _rvs(self, *x, **y):
            return kernel.resample(int(self._size)) #random variates
        def _cdf(self, x):
            return kernel.integrate_box_1d(0,max(x)) #Integrate pdf between two bounds (-inf to x here!)
        def _pdf(self, x):
            return kernel.evaluate(x)  #Evaluate the estimated pdf on a provided set of points
    return rv(name='kdedist')

test_data = np.random.random(100) # random test data 
distribution_data = getDistribution1(test_data)
pdf_data = distribution_data.pdf(test_data) # the pdf of the data

In the above piece of code, there exists three methods,

  1. rvs to generate random samples based on data
  2. cdf which is the integral of the pdf from 0 to max(data)
  3. pdf which is the pdf of the data

The reason I need this pdf is because now I am trying to calculate weights for my data based on probability. So that I can give each of my data point a probability which I can then use as my weights.

I would also like to know from here how I should proceed to calculate my weights?

P.S. Forgive me for asking the same question in cross validated, there seems to be no response!

1条回答
放我归山
2楼-- · 2019-08-15 03:24

The online docs have a link to the source code, which for gaussian_kde is here: https://github.com/scipy/scipy/blob/v0.15.1/scipy/stats/kde.py#L193

查看更多
登录 后发表回答