For this dataset:
dat = structure(list(x = c(5L, 5L, 5L, 5L, 10L, 10L, 10L, 10L, 15L,
15L, 15L, 15L, 17L, 17L, 17L, 17L, 20L, 20L, 20L, 20L, 20L, 20L,
20L, 20L, 22L, 22L, 22L, 22L, 24L, 24L, 24L, 24L, 25L, 25L, 25L,
25L, 27L, 27L, 27L, 27L, 30L, 30L, 30L, 30L, 35L, 35L, 35L, 35L),
y = c(2.2, 2.2, 1.95, 1.9, 4.1, 3.95, 3.75, 3.4, 5.15, 4.6,
4.75, 5.15, 3.7, 4.1, 3.9, 3.5, 7, 6.7, 6.7, 6.95, 4.95, 6, 6.45,
6.4, 7, 4.45, 6.15, 6.4, 7, 6.6, 6.7, 7, 4.5, 4.7, 5.75, 4.35,
5.4, 5.15, 5.7, 5.7, 0, 0, 0.5, 0, 0, 0, 0, 0)), .Names = c("x", "y"),
row.names = c(6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L,
15L, 16L, 17L, 34L, 35L, 36L, 37L, 18L, 19L, 20L, 21L, 38L, 39L,
40L, 41L, 42L, 43L, 44L, 45L, 46L, 47L, 48L, 49L, 22L, 23L, 24L,
25L, 50L, 51L, 52L, 53L, 26L, 27L, 28L, 29L, 30L, 31L, 32L, 33L),
class = "data.frame")
Where "x" is the temperature and "y" is the response variable of a biological process
I'm trying to fit this function
beta.reg<-function(x, Yopt,Tmin,Topt,Tmax, b1) {
Yopt*((x-Tmin)/(Topt-Tmin))^(b1*(Topt-Tmin)/(Tmax-Topt))*((Tmax-x) / (Tmax-Topt)) ^ b1
}
mod <- nls(y ~ beta.reg(x, Yopt,Tmin,Topt,Tmax, b1), data=dat,
start=c(Yopt=6, Tmin=0.1, Topt=24, Tmax=30, b1=1),
control=nls.control(maxiter=800))
But, I'm having this message error:
Error en numericDeriv(form[[3L]], names(ind), env) : Missing value or an infinity produced when evaluating the model
I've tried the same function with another similar dataset and fits correctly...
rnorm<-(10)
y <- c(20,60,70,49,10)
rnorm<-(10)
y <- c(20,60,70,49,10)
dat<-data.frame(x = rep(c(15,20,25,30,35), times=5),
rep = as.factor(rep(1:5, each=5)),
y = c(y+rnorm(5), y+rnorm(5),y+rnorm(5),y+rnorm(5),y+rnorm(5)))
Could someone help me with this?
Session Info:
R version 3.1.1 (2014-07-10)
Platform: x86_64-pc-linux-gnu (64-bit)
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] nlme_3.1-118 latticeExtra_0.6-26 RColorBrewer_1.0-5 lattice_0.20-29
loaded via a namespace (and not attached):
[1] grid_3.1.1 tools_3.1.1
Adding another possible solution to the @jlhoward 's one...
I found this
nls2
package:Exludying
x~17,35
from the original dataset:Applying the function to the reduced dataset:
Creating a set of starters:
Fitting the model:
Extracting coefficients:
Diagnostics:
And finally, the adjusted function and the QQ-normal plot:
There are so many problems here that I doubt it can be covered adequately in an SO post, but this should get you started.
First, it looks like you want
Tmax < max(dat$x)
, e.g., < 35. This causes a problem because thenTmax - x < 0
for some values ofx
and when you try to raise a negative number to a power (in the second term of your formula), you getNA
's. This is the cause of the error message.Second, convergence of a non-linear model depends on the model formula and also the data, so the fact that the process converges with one set of data but not another is completely irrelevant.
Third, non-linear modelling iteratively minimizes the residual sum of squares as a function of the parameters. If the RSS surface has local minima, and your
start
is close to one, the algorithms will find it. But only the global minimum is the true solution. Your problem has many, many local minima.Fourth,
nls(...)
uses the Gauss Newton method by default. Gauss Newton is notoriously unstable with shifting parameters (parameters which are added to or subtracted from the predictor, soTmin
andTmax
in your case). Fortunately, theminpak.lm
package implements the Levenberg Marquardt method, which is much more stable under these conditions. ThenlsLM(...)
function in that package uses the same calling sequence asnls(...)
and returns and object of typenls
, so all the methods for that class of object work as well. Use that.Fifth, a fundamental assumption in non-linear regression (in fact all least-squares regression) is that the residuals are normally distributed. So you have to validate any solution using a Q-Q plot.
Sixth, your model has a perverse set of characteristics. When
Tmin -> -Inf
the first term in the model approaches1
. It turns out that this yields a lower RSS than any other value ofTmin
less thanmin(dat$x)
, so the algorithms all tend to driveTmin
to large negative values. You can see this easily as follows:This looks like a pretty decent fit but it isn't: the Q-Q plot shows that the residuals are not remotely normal. The fact that both
Tmin
andb1
are very poorly estimated, and the value forTmin
is not physically meaningful, are problems with the data, not the fit.Seventh, it turns out that the fit above is actually a local minimum. We can see this by doing a grid search on
Tmin
,Tmax
, andb1
(leaving outYopt
andTopt
to save time, and because these parameters are well estimated regardless of the starting point).Mathematically, this is a distinctly superior fit: RSS is lower and the residuals are much more nearly normally distributed. Again, the fact that the parameters are not well estimated, and are not physically meaningful, is a problem with the data (and perhaps the model formula), not the fitting process.
All of the foregoing suggests that there is something wrong with your model. One problem with it, mathematically, is that the function is undefined for
x
outside of(Tmin,Tmax)
. Since you have data out tox=35
, the fitting algorithm will never yieldTmax < 35
(if it converges). An approach to deal with this changes your model function slightly to clip to 0 outside that range. (I have no idea if this is legitimate based on the physics of your problem, though...).Running the code above with this function yields:
In fact the grid search yields exactly the same result independent of starting point. Note that RSS is lower than any of the results with the earlier model, and that
b1
is much better estimated (and very different from the estimate with the earlier model function). The residuals are still not normal, but in this case I would want to inspect the data for outliers.