Apologies if the answer is obvious but I've spent quite some time trying to use a custom link function in mgcv.gam
In short,
- I want to use a modified probit link from package psyphy ( I want to use psyphy.probit_2asym, I call it
custom_link
)
I can create a {stats}family object with this link and use it in the 'family' argument of glm.
m <- glm(y~x, family=binomial(link=custom_link), ... )
It does not work when used as an argument for {mgcv}gam
m <- gam(y~s(x), family=binomial(link=custom_link), ... )
I get the error Error in fix.family.link.family(family) : link not recognised
I do not get the reason for this error, both glm and gam work if I specify the standard link=probit
.
So my question can be summarized as:
what is missing in this custom link that works for glm but not for gam?
Thanks in advance if you can give me a hint on what I should do.
Link function
probit.2asym <- function(g, lam) {
if ((g < 0 ) || (g > 1))
stop("g must in (0, 1)")
if ((lam < 0) || (lam > 1))
stop("lam outside (0, 1)")
linkfun <- function(mu) {
mu <- pmin(mu, 1 - (lam + .Machine$double.eps))
mu <- pmax(mu, g + .Machine$double.eps)
qnorm((mu - g)/(1 - g - lam))
}
linkinv <- function(eta) {
g + (1 - g - lam) *
pnorm(eta)
}
mu.eta <- function(eta) {
(1 - g - lam) * dnorm(eta) }
valideta <- function(eta) TRUE
link <- paste("probit.2asym(", g, ", ", lam, ")", sep = "")
structure(list(linkfun = linkfun, linkinv = linkinv,
mu.eta = mu.eta, valideta = valideta, name = link),
class = "link-glm")
}
As you may know, glm
takes iteratively reweighted least squares fitting iterations. Early version of gam
extends this by fitting an iteratively penalized reweighted least squares, which is done by the gam.fit
function. This is known as performance iteration in some context.
Since 2008 (or maybe slightly even earlier), gam.fit3
based on what is called outer iteration has replaced gam.fit
as gam
default. Such change does require some extra information of the family, regarding which you can read about ?fix.family.link
.
The major difference between two iterations is whether iteration of coefficients beta
and iteration of smoothing parameters lambda
are nested.
- Performance iteration takes the nested way, where for each update of
beta
, a single iteration of lambda
is performed;
- Outer iteration completely separate those 2 iterations, where for each update of
beta
, iteration of lambda
is carried to the end till convergence.
Obviously outer iteration is more stable and less likely to suffer from failure of convergence.
gam
has an argument optimizer
. By default it takes optimizer = c("outer", "newton")
, that is the newton method of outer iteration; but if you set optimizer = "perf"
, it will take performance iteration.
So, after the above overview, we have two options:
- still use outer iteration, but expand your customized link function;
- use performance iteration to stay in line with
glm
.
I am being lazy so will demonstrate the second one (actually I am not feeling too confident to take the first approach).
Reproducible Example
You did not provide a reproducible example, so I prepare one as below.
set.seed(0)
x <- sort(runif(500, 0, 1)) ## covariates (sorted to make plotting easier)
eta <- -4 + 3 * x * exp(x) - 2 * log(x) * sqrt(x) ## true linear predictor
p <- binomial(link = "logit")$linkinv(eta) ## true probability (response)
y <- rbinom(500, 1, p) ## binary observations
table(y) ## a quick check that data are not skewed
# 0 1
#271 229
I will take g = 0.1
and lam = 0.1
of the function probit.2asym
you intend to use:
probit2 <- probit.2asym(0.1, 0.1)
par(mfrow = c(1,3))
## fit a glm with logit link
glm_logit <- glm(y ~ x, family = binomial(link = "logit"))
plot(x, eta, type = "l", main = "glm with logit link")
lines(x, glm_logit$linear.predictors, col = 2)
## glm with probit.2asym
glm_probit2 <- glm(y ~ x, family = binomial(link = probit2))
plot(x, eta, type = "l", main = "glm with probit2")
lines(x, glm_probit2$linear.predictors, col = 2)
## gam with probit.2aysm
library(mgcv)
gam_probit2 <- gam(y ~ s(x, bs = 'cr', k = 3), family = binomial(link = probit2),
optimizer = "perf")
plot(x, eta, type = "l", main = "gam with probit2")
lines(x, gam_probit2$linear.predictors, col = 2)
I have used natural cubic spline basis cr
for s(x)
, as for univariate smooth the default setting with thin-plate spline is unnecessary. I have also set a small basis dimension k = 3
(can't be smaller for a cubic spline) as my toy data is near linear and big basis dimension is not needed. More importantly, this seems to prevent convergence failure of performance iteration for my toy dataset.