I am trying to extract the placement of the knots from a GAM model in order to delineate my predictor variable into categories for another model. My data contains a binary response variable (used) and a continuous predictor (open).
data <- data.frame(Used = rep(c(1,0,0,0),1250),
Open = round(runif(5000,0,50), 0))
I fit the GAM as such:
mod <- gam(Used ~ s(Open), binomial, data = data)
I can get the predicted values, and the model matrix etc with either type=c("response", "lpmatrix")
within the predict.gam
function but I am struggling with out to extract the knot locations at which which the coefficients change. Any suggestion is really appreciated!
out<-as.data.frame(predict.gam(model1, newdata = newdat, type = "response"))
I would also be interested if possible to do something like:
http://www.fromthebottomoftheheap.net/2014/05/15/identifying-periods-of-change-with-gams/
in which the statistical increase/decrease of the splines is identified, however, I am not using a GAMM at this point, and thus, am having problems identifying the similar model characteristics in GAM that are extracted from his GAMM model. This second item is more out of curiosity than anything.
Comments:
- You should have tagged your question with
R
and mgcv
when asking;
- At first I want to flag your question as duplicate to mgcv: how to extract knots, basis, coefficients and predictions for P-splines in adaptive smooth? raised yesterday, and my answer there should be pretty useful. But then I realized that there is actually some difference. So I will make some brief explanation here.
Answer:
In your gam
call:
mod <- gam(Used ~ s(Open), binomial, data = data)
you did not specify bs
argument in s()
, therefore the default basis: bs = 'tp'
will be used.
'tp'
, short for thin-plate regression spline, is not a smooth class that has conventional knots. Thin plate spline does have knots: it places knots exactly at data points. For example, if you have n
unique Open
values, then it has n
knots. In univariate case, this is just a smoothing spline.
However, thin plate regression spline is a low rank approximation to full thin-plate spline, based on truncated eigen decomposition. This is a similar idea to principal components analysis(PCA). Instead of using the original n
number of thin-plate spline basis, it uses the first k
principal components. This reduces computation complexity from O(n^3)
down to O(nk^2)
, while ensuring optimal rank-k approximation.
As a result, there is really no knots you can extract for a fitted thin-plate regression spline.
Since you work with univariate spline, there is really no need to go for 'tp'
. Just use bs = 'cr'
, the cubic regression spline. This used to be the default in mgcv
before 2003, when tp
became available. cr
has knots, and you can extract knots as I showed in my answer. Don't be confused by the bs = 'ad'
in that question: P-splines, B-splines, natural cubic splines, are all knots-based splines.