Curve fiting of normal distribution in Python

2019-07-15 02:07发布

I want to calculate the percentiles of normal distribution data, so I first fit the data to the normal distribution, here is the example:

from scipy.stats import norm
import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt

x = np.array([ 0.47712125,  0.5445641 ,  0.61193563,  0.67924615,  0.74671202,
    0.81404772,  0.88144172,  0.94885291,  1.01623919,  1.08361011,
    1.15100191,  1.21837793,  1.28578227,  1.3531658 ,  1.42054981,
    1.48794397,  1.55532424,  1.62272161,  1.69010744,  1.75749472,
    1.82488047,  1.89226717,  1.9596566 ,  2.02704774,  2.09443269,
    2.16182302,  2.2292107 ,  2.29659719,  2.36398595,  2.43137342,
    2.49876254,  2.56614983,  2.63353814,  2.700926  ,  2.76831392,
    2.83570198,  2.90308999,  2.97008999,  3.03708997,  3.10408999,
    3.17108999,  3.23808998,  3.30508998,  3.37208999,  3.43908999,
    3.50608998,  3.57308998,  3.64008999,  3.70708999,  3.77408999,
    3.84108999,  3.90808999])
y = array([  0.00000000e+00,   0.00000000e+00,   0.00000000e+00,
     0.00000000e+00,   0.00000000e+00,   0.00000000e+00,
     0.00000000e+00,   5.50000000e+01,   1.33500000e+02,
     2.49000000e+02,   4.40000000e+02,   7.27000000e+02,
     1.09000000e+03,   1.53000000e+03,   2.21500000e+03,
     3.13500000e+03,   4.44000000e+03,   5.57000000e+03,
     6.77000000e+03,   8.04500000e+03,   9.15500000e+03,
     1.00000000e+04,   1.06000000e+04,   1.06500000e+04,
     1.02000000e+04,   9.29000000e+03,   8.01500000e+03,
     6.50000000e+03,   5.24000000e+03,   4.11000000e+03,
     2.97000000e+03,   1.86000000e+03,   1.02000000e+03,
     5.26500000e+02,   2.49000000e+02,   1.11000000e+02,
     5.27000000e+01,   6.90825000e+00,   4.54329000e+00,
     3.63846500e+00,   3.58135000e+00,   2.37404000e+00,
     1.81840000e+00,   1.20159500e+00,   6.02470000e-01,
     3.43295000e-01,   1.62295000e-01,   7.99350000e-02,
     3.60750000e-02,   1.50000000e-02,   3.61500000e-03,
     8.00000000e-05])

def datafit(x,N,u,sig):
    y = N/(np.sqrt(2*np.pi)*sig)*np.exp(-(x-u)**2/2*sig**2)
    return y
popt,popc = curve_fit(datafit,x,y,p0=[np.max(y),2,2])
Normal_distribution = norm(loc = popt[-2],scale = popt[-1])

Then I checked if the plot of (x,y) and (x,popt[0]*Normal_distribution.pdf(x))are same, but the result shows they are totally different.... Blue line is plot of (x,y)

The blue line is plot of (x,y), and the orange line is the plot of (x,popt[0]*Normal_distribution.pdf(x).

Why this happen? Is there anything wrong in my code?

1条回答
干净又极端
2楼-- · 2019-07-15 02:32

depends on what you plotted, these look OK to me:

plt.plot(x,y)
Out[3]: [<matplotlib.lines.Line2D at 0xb9cef98>]

popt,popc
Out[4]: 
(array([  8.41765250e+04,   1.98651581e+00,   3.15537860e+00]),
 array([[  5.64670700e+05,   1.12782889e-05,   1.15455042e+01],
        [  1.12782889e-05,   2.91058556e-06,   2.73909077e-10],
        [  1.15455042e+01,   2.73909077e-10,   2.88523818e-04]]))

plt.plot(x,datafit(x,*popt))
Out[5]: [<matplotlib.lines.Line2D at 0xb990080>]

enter image description here

my guess is that you've got an error in sig, scale and *,/ in your datafit def vs norm()

I rewrote datafit to match the scipy norm.pdf

and still have a factor of ~pi issue which may be just definitional: https://en.wikipedia.org/wiki/Normal_distribution

oops, looks like the "factor of pi" was just coincidence of your particular data
rereading the norm.pdf def suggest the whole is rescaled by the 'scale" factor so now I think it should be:

'''
norm.pdf(x) = exp(-x**2/2)/sqrt(2*pi)
norm.pdf(x, loc, scale) == norm.pdf(y) / scale with y = (x - loc) / scale
'''
def datafit(x,N,u,sig):
#    y = N/(np.sqrt(2*np.pi)*sig)*np.exp(-(x-u)**2/2*sig**2)
    y = N*np.exp(-((x-u)/sig)**2/2)/(np.sqrt(2*np.pi))
    return y
popt,popc = curve_fit(datafit,x,y,p0=[np.max(y),2,2])

# scipy norm.pdf with scaling factors to match datafit()
Normal_distribution = popt[0]*popt[2]*norm.pdf(x, popt[1], popt[2])

plt.plot(x,y, 'b')
plt.plot(x, datafit(x+.1, *popt), 'g')
plt.plot(x, Normal_distribution, 'r')

enter image description here

查看更多
登录 后发表回答