How can I use scipy optimization to find the minim

2019-07-14 20:01发布

问题:

I have a histogram of sorted random numbers and a Gaussian overlay. The histogram represents observed values per bin (applying this base case to a much larger dataset) and the Gaussian is an attempt to fit the data. Clearly, this Gaussian does not represent the best fit to the histogram. The code below is the formula for a Gaussian.

normc, mu, sigma = 30.845, 50.5, 7 # normalization constant, avg, stdev
gauss = lambda x: normc * exp( (-1) * (x - mu)**2 / ( 2 * (sigma **2) ) )

I calculated the expectation values per bin (area under the curve) and calculated the number of observed values per bin. There are several methods to find the 'best' fit. I am concerned with the best fit possible by minimizing Chi-Squared. In this formula for Chi-Squared, the expectation value is the area under the curve per bin and the observed value is the number of occurrences of sorted data values per bin. So I want to fluctuate normc, mu, and sigma near their given values to find the right combination of normc, mu, and sigma that produce the smallest Chi-Square, as these will be the parameters I can plug into the code above to overlay the best fit Gaussian on my histogram. I am trying to use the scipy module to minimize my Chi-Square as done in this example. Since I need to fluctuate parameters, I will use the function gauss (defined above) to plot the Gaussian overlay, and will define a new function to find the minimum Chi-Squared.

def gaussmin(var,data):
    # var[0] = normc
    # var[1] = mu
    # var[2] = sigma
    # data is the sorted random numbers, represents unbinned observed values
    for index in range(len(data)):
        return var[0] * exp( (-1) * (data[index] - var[1])**2 / ( 2 * (var[2] **2) ) ) 
    # I realize this will return a new value for each index of data, any guidelines to fix?

After this, I am stuck. How can I fluctuate the parameters to find the normc, mu, sigma that produced the best fit? My last attempt at a solution is below:

var = [normc, mu, sigma]
result = opt.minimize(chi2, [normc,mu,sigma])
# chi2 is the chisquare value obtained via scipy
# chisquare input (a,b) 
# where a is number of occurences per bin, b is expected value per bin
# b is dependent upon normc, mu, sigma
print(result)
# data is a list, can I keep it as a constant and only fluctuate parameters in var?

There are plenty of examples online for scalar functions but I cannot find any for variable functions.

PS - I can post my full code so far but it's bit lengthy. If you would like to see it, just ask and I can post it here or provide a googledrive link.

回答1:

A Gaussian distribution is completely characterized by its mean and variance (or std deviation). Under the hypothesis that your data are normally distributed, the best fit will be obtained by using x-bar as the mean and s-squared as the variance. But before doing so, I'd check whether normality is plausible using, e.g., a q-q plot.