I have a problem that I am trying to resolve with no success. More than two days searching and I didn’t get a single clue. Sorry if the answer is out there and I didn’t find it.
Suppose that you have a logistic equation regression (binary model) from an old model that you estimated some years ago. Therefore you know the parameters βk (k = 1, 2, ..., p) because they were estimated in the past. But you don’t have the data that were used to fit the model.
My question is: can I introduce this old estimated logistic model in R as an object (corresponding to a logistic regression model)?
I would like to use the “predict” function to prove this logistic regression with a new set of data (present data) and then check the validity of this old model standing the test of time. And to use this function you need the object of the logistic regression model.
Thank you very much in advance.
Per my comment, I think you could start by just calculating predictions directly from the coefficients. Here's an example that compares the output from predict.glm
to predicted probabilities calculated directly on the data:
# construct some data and model it
# y ~ x1 + x2
set.seed(1)
x1 <- runif(100)
x2 <- runif(100)
y <- rbinom(100,1,(x1+x2)/2)
data1 <- data.frame(x1=x1,x2=x2,y=y)
x3 <- runif(100)
x4 <- runif(100)
y2 <- rbinom(100,1,(x3+x4)/2)
data2 <- data.frame(x1=x3,x2=x4,y=y2)
glm1 <- glm(y~x1+x2,data=data1,family=binomial)
# extract coefs
#summary(glm1)
coef1 <- coef(glm1)
# calculate predicted probabilities for current data
tmp1 <- coef1[1] + (data1$x1*coef1[2]) + (data1$x2*coef1[3])
pr1 <- 1/(1+(1/exp(tmp1)))
# these match those from `predict`:
all.equal(pr1,predict(glm1,data1,type='response'))
# now apply to new data:
tmp2 <- coef1[1] + (data2$x1*coef1[2]) + (data2$x2*coef1[3])
pr2 <- 1/(1+(1/exp(tmp2)))
pr2
This is obviously not a general solution, nor does it properly handle uncertainty, but I think it's a better approach than hacking predict
.
You can create a glm fit with only an offset created from the coefficients that you have, then use the regular predict function with that. For example using the iris data (first fitting a model on the real data, then fitting a new model using dummy data and the coefficients from the first fit):
fit1 <- glm( I(Species=='versicolor') ~ Petal.Length + Petal.Width,
data=iris, family=binomial )
coef(fit1)
dummydata <- data.frame( Petal.Length = rnorm(10), Petal.Width=rnorm(10),
Species = rep(c('versicolor','other'), each=5) )
fit2 <- glm( I(Species=='versicolor') ~ 0 +
offset(-2.863708 + 1.563076*Petal.Length - 3.153165*Petal.Width),
data=dummydata, family=binomial )
pred1 <- predict(fit1, newdata=iris)
pred2 <- predict(fit2, newdata=iris)
plot(pred1,pred2)
abline(0,1, col='green')