Consider a nonlinear least squares model in R, for example of the following form):
y ~ theta / ( 1 + exp( -( alpha + beta * x) ) )
(my real problem has several variables and the outer function is not logistic but a bit more involved; this one is simpler but I think if I can do this my case should follow almost immediately)
I'd like to replace the term "alpha + beta * x" with (say) a natural cubic spline.
here's some code to create some example data with a nonlinear function inside the logistic:
set.seed(438572L)
x <- seq(1,10,by=.25)
y <- 8.6/(1+exp( -(-3+x/4.4+sqrt(x*1.1)*(1.-sin(1.+x/2.9))) )) + rnorm(x, s=0.2 )
Without the need for a logistic around it, if I was in lm, I could replace a linear term with a spline term easily; so a linear model something like this:
lm( y ~ x )
then becomes
library("splines")
lm( y ~ ns( x, df = 5 ) )
generating fitted values is simple and getting predicted values with the aid of (for example) the rms package seems simple enough.
Indeed, fitting the original data with that lm-based spline fit isn't too bad, but there's a reason I need it inside the logistic function (or rather, the equivalent in my problem).
The problem with nls is I need to provide names for all the parameters (I'm quite happy with calling them say (b1, ..., b5) for one spline fit (and say c1, ... , c6 for another variable - I'll need to be able to make several of them).
Is there a reasonably neat way to generate the corresponding formula for nls so that I can replace the linear term inside the nonlinear function with a spline?
The only ways I can figure that there could be to do it are a bit awkward and clunky and don't nicely generalize without writing a whole bunch of code.
(edit for clarification) For this small problem, I can do it by hand of course - write out an expression for inner product of every variable in the matrix generated by ns, times the vector of parameters. But then I have to write the whole thing out term-by-term again for each spline in every other variable, and again every time I change the df in any of the splines, and again if I want to use cs instead of ns. And then when I want to try to do some prediction(/interpolation), we get a whole new slew of issues to be dealt with. I need to keep doing it, over and over, and potentially for a substantially larger number of knots, and over several variables, for analysis after analysis - and I wondered if there was a more neat, simple way than writing out each individual term, without having to write a great deal of code. I can see a fairly bull-at-a-gate way to do it that would involve a fair bit of code to get right, but being R, I suspect there's a much neater way (or more likely 3 or 4 neater ways) that's simply eluding me. Hence the question.
I thought I had seen someone do something like this in the past in a fairly nice way, but for the life of me I can't find it now; I've tried a bunch of times to locate it.
[More particularly, I'd generally like to be able to try the fit any of several different splines in each variable - to try a couple of possibilities - in order to see if I could find a simple model, but still one where the fit is adequate for the purpose (noise is really quite low; some bias in the fit is okay to achieve a nice smooth result, but only up to a point). It's more 'find a nice, interpretable, but adequate fitting function' than anything approaching inference and data mining isn't really an issue for this problem.]
Alternatively, if this would be much easier in say gnm or ASSIST or one of the other packages, that would be useful knowledge, but then some pointers on how to proceed on the toy problem above with them would help.
A realization I came to while clarifying my own question made me see that there's a less clunky way than I had seen before.
Even with a bit of obvious streamlining that can go in, this is still a bit inelegant to my eye, but at least bearable enough to use on a repeated basis, so I regard it as an adequate answer. I'm still interested in a neater way than this one below.
Hong Ooi's trick of using data.frame on the matrix generated by ns to auto-name the columns is kind of cute and I have used it below. I'll likely use paste to build them in general, because I have several variables to play with.
Assuming the data set-up given in the question -
My actual formula will have several terms like nspb. Substantive improvements appreciated; I'd prefer not to choose my own answer, but I guess I will pick it if there's nothing further in a day or two.
edit: Hong Ooi's addition (which was posted as I was typing mine in and uses similar ideas, but add a couple of nice extras) pretty much does it; it's an acceptable answer, so I have checked it.
ns
actually generates a matrix of predictors. What you can do is split that matrix out into individual variables, and feed them tonls
.ETA: here's a go at automating this for different values of df. This constructs the formula using text munging, and then uses
do.call
to callnls
. Caveat: untested.