I have a range of particle size distribution data arranged by percentage volume fraction, like so:;
size %
6.68 0.05
9.92 1.15
etc.
I need to fit this data to a lognormal distribution, which I planned to do using python's stats.lognorm.fit
function, but this seems to expect the input as an array of variates rather than binned data, judging by what I've read.
I was planning to use a for loop to iterate through the data and .extend
each size entry to a placeholder array the required number of times to create an array with a list of variates that corresponds to the binned data.
This seems really ugly and inefficient though, and the kind of thing that there's probably an easy way to do. Is there a way to input binned data into the stats.lognorm.fit
function?
I guess one possible workaround is to manually fit a pdf to your bin data, assuming x values are the midpoint of each interval, and y values are the corresponding bin frequency. And then fit a curve based on x and y values using scipy.optimize.curve_fit
. I think accuracy of the results will depend the number of bins you have. An example is shown below:
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
import numpy as np
def pdf(x, mu, sigma):
"""pdf of lognormal distribution"""
return (np.exp(-(np.log(x) - mu)**2 / (2 * sigma**2)) / (x * sigma * np.sqrt(2 * np.pi)))
mu, sigma = 3., 1. # actual parameter value
data = np.random.lognormal(mu, sigma, size=1000) # data generation
h = plt.hist(data, bins=30, normed = True)
y = h[0] # frequencies for each bin, this is y value to fit
xs = h[1] # boundaries for each bin
delta = xs[1] - xs[0] # width of bins
x = xs[:-1] + delta / # midpoints of bins, this is x value to fit
popt, pcov = curve_fit(pdf, x, y, p0=[1, 1]) # data fitting, popt contains the fitted parameters
print(popt)
# [ 3.13048122 1.01360758] fitting results
fig, ax = plt.subplots()
ax.hist(data, bins=30, normed=True, align='mid', label='Histogram')
xr = np.linspace(min(xs), max(xs), 10000)
yr = pdf(xr, mu, sigma)
yf = pdf(xr, *popt)
ax.plot(xr, yr, label="Actual")
ax.plot(xr, yf, linestyle = 'dashed', label="Fitted")
ax.legend()