Using nls in R to re-create research

2019-04-01 17:16发布

I am learning how to use the nls function in R and am having some issues. I am simply attempting to recreate the curve found in a research paper for now. The model fits a curve to the the stock market movements prior to the 1987 crash.

I have defined a function, func, as follows:

func <- function(a,b,tc,t){
 a+b*log(tc-t)
}

I have called nls like this:

nls1 <- nls(Y ~ func(a,b,tc,t), data2, start=list(a=0, b=1, tc=1466, t=1))

data2 is a data frame that consists of two columns, one is the date, the other is a value. There are 1466 rows.

head(data2)
 Date      Y
1  1/4/82 882.52
2  1/5/82 865.30
3  1/6/82 861.02
4  1/7/82 861.78
5  1/8/82 866.53
6 1/11/82 850.46

I get the following messages when I run nls,

Error in qr(.swts * attr(rhs, "gradient")) : 
  dims [product 4] do not match the length of object [1466]

In addition: Warning message:

In .swts * attr(rhs, "gradient") :
  longer object length is not a multiple of shorter object length

From what I can gather, this is a problem with the way the data frame is set up but I cannot find a solution.

Any idea how I can move this father along?

Thank you very much for your help.

标签: r nls
2条回答
forever°为你锁心
2楼-- · 2019-04-01 17:43

The basic problem is that you have not specified an independent variable. By specifying start(...) for a, b, tc, and t, you are telling nls(...) that these are all parameters of the model.

It looks like you are using a simplified version of the LPPL model, in which a, b, and tc are parameters, and t is the independent variable. It looks like data2$Date contains the time variable. You need to make sure the data2$Date is of class POSIXct. So you could write:

df$Date <- as.POSIXct(df$Date, format="%m/%d/%y")
nls1 <- nls(Y~a+b*log(tc-Date), data=data2, start=list(a=0, b=1, tc=1466))

EDIT: In response to OP's comments

This is a great question, because it illustrates several issues in using nls(...). The problem you are having (now that the model is specified correctly) is that nls(...) is not converging - a distressingly common occurrence. Basically, unless your starting parameter estimates are relatively close to the final, fitted values (or unless the model is extremely "well behaved"), nls will fail. [Note also that the paper you cite mentions that b is restricted to b < 0, whereas you are starting with b=1.] So what to do?

The minpack.lm(...) function in package minpack uses the exceptionally robust Levenberg-Marquardt algorithm for non-linear least squares estimation. In fact, the paper you cite mentions L-M specifically. The problem with minpack.lm(...) is that it is much more difficult to use (you have to define a function which returns the residuals at a given step, rather than just defining the function to fit). Plus, minpack.lm(...) does not calculate statistics of the fit.

So the solution is to use them both! Use minpack.lm(...) to estimate the parameters, then use those as "starting values" in nls(...). The code below does that. Having a model fitted using nls(...) will make it much easier to generate statistics of the fit, predicted values, residuals, and also to apply the model to new datasets.

# this section just grabs the DJIA for 1982 - 1987; you already have this
library(tseries)
library(zoo)
ts <- get.hist.quote(instrument="DJIA", 
                     start="1982-01-01", end="1987-08-01", 
                     quote="Close", provider="yahoo", origin="1970-01-01",
                     compression="d", retclass="zoo")
df <- data.frame(ts)
df <- data.frame(Date=as.Date(rownames(df)),Y=df$Close)
df <- df[!is.na(df$Y),]
# end of setup...
library(minpack.lm) # for nls.lm(...)
library(ggplot2)    # for ggplot
df$days <- as.numeric(df$Date - df[1,]$Date)
# model based on a list of parameters
f <- function(pars, xx) {pars$a + pars$b*log(pars$tc - xx)} 
# residual function
resids <- function(p, observed, xx) {df$Y - f(p,xx)}
# fit using Levenberg-Marquardt algorithm
nls.out <- nls.lm(par=list(a=1,b=-1,tc=5000), fn = resids, 
                  observed = df$Y, xx = df$days)
# use output of L-M algorithm as starting estimates in nls(...)
par <- nls.out$par
nls.final <- nls(Y~a+b*log(tc-days),data=df, 
                 start=c(a=par$a, b=par$b, tc=par$tc))
summary(nls.final)      # display statistics of the fit 
# append fitted values to df
df$pred <- predict(nls.final)
# plot the results
ggplot(df)+
  geom_line(aes(x=Date,y=Y),color="black")+
  geom_line(aes(x=Date,y=pred),color="blue",linetype=2)+
  labs(title="LPPL Model Applied to DJIA (1982 - 1987)",
       x="", y="DJIA (daily close)")+
  theme(plot.title=element_text(face="bold"))

查看更多
Evening l夕情丶
3楼-- · 2019-04-01 17:58

Normally, when one performs a least squares regression, the assumption is that there is a so-called "dependent" or "response" variable (Y, in your case), which is a function of one or more "independent" or "predictor" variables (Date), and typically the detailed specification of the predictor function itself is usually defined by a fairly small number of static parameters (a and b, and possibly also t and/or tc as well, depending on what exactly it is that you are trying to achieve). The job of the nls() function is to find the optimal values for those static parameters which would lead to the most accurate possible prediction.

The input to your predictor function func appears to be missing the required independent variable. So, I think you probably need to do one of two things. Either you modify func so that it accepts Date as an input, or else you change the Date column label in your data frame so that the name matches one of the func inputs (most likely I suspect that you would want to rename the Date column so that it corresponds to tc). In either case, if you want to perform a calculation in which you subtract a date value in the data frame from a fixed offset date (e.g., (tc - t) as it appears to be written right now), you will need to check that R is actually recognizing your dates as Date objects and not as strings, so that it knows how to meaningfully subtract one from the other. The as.Date() function may be helpful to you for this purpose.

As an additional alternative, rather than trying to rewrite func so that it accepts R Date objects as inputs, you may find it simpler just to reassign the Date column in the data frame to an elapsed integer number of days with reference to some offset; e.g., do something like:

data2$tc <- as.numeric(as.Date(data2$Date) - as.Date("1982-1-4"))

or similar.

查看更多
登录 后发表回答