I have a nonlinear model fit that looks like this:
The dark solid line is the model fit, and the grey part is the raw data.
Short version of the question: how do I get the likelihood of this model fit, so I can perform log-likelihood ratio test? Assume that the residual is normally distributed.
I am relatively new to statistics, and my current thoughts are:
Get the residual from the curve fit, and calculate the variance of residual;
Use this equation And plug in the variance of residual into sigma-squared, x_i as experiment and mu as model fit;
Calculate the log-likelihood ratio.
Could anyone help me, with these two full-version questions?
Is my method correct? (I think so, but it would be really great to make sure!)
Are there any ready-made functions in python/scipy/statsmodels to do this for me?
Your likelihood function
which is simply the sum of log of probability density function of Gaussian distribution.
is the likelihood of fitting a mu and a sigma for your residue, not the likelihood of your model given your data. In one word, your approach is wrong.
Sine you are doing non-linear least square, following what @usethedeathstar already mentioned, you should go straight for
F-test
. . Consider the following example, modified from http://www.walkingrandomly.com/?p=5254, and we conductF-test
usingR
. And we will discuss how to translate it intopython
in the end.here we have two model,
fit1
has 2 parameters, therefore the residue has 8 degrees-of-freedom;fit2
has one additional parameter and the residue has 7 degrees of freedom. Is model 2 significantly better? No, the F value is 0.9194, on(1,7)
degrees of freedom and it is not significant.To get the ANOVA table: Residue DF is easy. Residue Sum of squares:
0.08202*0.08202*8=0.05381
and0.08243*0.08243*7=0.04756293
(notice: 'Residual standard error: 0.08243 on 7 degrees of freedom', etc). Inpython
, you can get it by(y_observed-y_fitted)**2
, sincescipy.optimize.curve_fit()
doesn't return the residues.The
F-ratio
is0.0062473/0.047565*7
and to get P-value:1-scipy.stats.f.cdf(0.9194, 1, 7)
.Put them together we have
python
equivalent:As @usethedeathstar pointed out: when you the residue is normally distributed, nonlinear least square IS the maximum likelihood. Therefore F-test and likelihood ratio test is equivalent. Because, F-ratio is a monotone transformation of the likelihood ratio λ.
Or in a descriptive way, see: http://www.stata.com/support/faqs/statistics/chi-squared-and-f-distributions/
Your formula looks correct to me. It should give you the same results as
scipy.stats.norm.logpdf(x, loc=mu, scale=sigma)
Since you already have your estimates of mu and sigma, I don't think there is a function for the likelihood ratio test where you can plug your results in.
If you have the estimates of two models, where one is nested in the other, then you can easily calculate it yourself.
http://en.wikipedia.org/wiki/Likelihood-ratio_test
Here is the part of a method in statsmodels that calculates the LR-test for comparing two nested linear models https://github.com/statsmodels/statsmodels/blob/master/statsmodels/regression/linear_model.py#L1531