So this is a very simple question, just can't seem to figure it out.
I'm running a logit using the glm function, but keep getting warning messages relating to the independent variable. They're stored as factors and I've changed them to numeric but had no luck. I also coded them to 0/1 but that did not work either.
Please help!
> mod2 <- glm(winorlose1 ~ bid1, family="binomial")
Warning messages:
1: glm.fit: algorithm did not converge
2: glm.fit: fitted probabilities numerically 0 or 1 occurred
I also tried it in Zelig, but similar error:
> mod2 = zelig(factor(winorlose1) ~ bid1, data=dat, model="logit")
How to cite this model in Zelig:
Kosuke Imai, Gary King, and Oliva Lau. 2008. "logit: Logistic Regression for Dichotomous Dependent Variables" in Kosuke Imai, Gary King, and Olivia Lau, "Zelig: Everyone's Statistical Software," http://gking.harvard.edu/zelig
Warning messages:
1: glm.fit: algorithm did not converge
2: glm.fit: fitted probabilities numerically 0 or 1 occurred
EDIT:
> str(dat)
'data.frame': 3493 obs. of 3 variables:
$ winorlose1: int 2 2 2 2 2 2 2 2 2 2 ...
$ bid1 : int 700 300 700 300 500 300 300 700 300 300 ...
$ home : int 1 0 1 0 0 0 0 1 0 0 ...
- attr(*, "na.action")=Class 'omit' Named int [1:63021] 3494 3495 3496 3497 3498 3499 3500 3501 3502 3503 ...
.. ..- attr(*, "names")= chr [1:63021] "3494" "3495" "3496" "3497" ...
This is probably due to complete separation, i.e. one group being entirely composed of 0s or 1s.
There are several options to deal with this:
(a) Use Firth's penalized likelihood method, as implemented in the packages logistf or brglm in R. This uses the method proposed in Firth (1993), "Bias reduction of maximum likelihood estimates", Biometrika, 80,1.; which removes the first-order bias from maximum likelihood estimates.
(b) By using median-unbiased estimates in exact conditional logistic regression. Package elrm or logistiX in R can do this.
(c) Use LASSO or elastic net regularized logistic regression, e.g. using the glmnet package in R.
(d) Go Bayesian, cf. the paper Gelman et al (2008), "A weakly informative default prior distribution for logistic & other regression models", Ann. Appl. Stat., 2, 4 and function bayesglm in the arm package.
(e) Use a hidden logistic regression model, as described in Rousseeuw & Christmann (2003),"Robustness against separation and outliers in logistic regression", Computational Statistics & Data Analysis, 43, 3, and implemented in the R package hlr.
You need to recode your factor as a factor first though using
dat$bid1 = as.factor(dat$bid1)
)Solutions to this problem are also discussed here:
https://stats.stackexchange.com/questions/11109/how-to-deal-with-perfect-separation-in-logistic-regression
https://stats.stackexchange.com/questions/45803/logistic-regression-in-r-resulted-in-perfect-separation-hauck-donner-phenomenon
https://stats.stackexchange.com/questions/239928/is-there-any-intuitive-explanation-of-why-logistic-regression-will-not-work-for
https://stats.stackexchange.com/questions/5354/logistic-regression-model-does-not-converge?rq=1
If you look at
?glm
(or even do a Google search for your second warning message) you may stumble across this from the documentation:Now, not everyone has that book. But assuming it's kosher for me to do this, here's the relevant passage:
One of the authors of this book commented in somewhat more detail here. So the lesson here is to look carefully at one of the levels of your predictor. (And Google the warning message!)
If you have correctly specified the GLM formula and the corresponding inputs (i.e., design matrix, link function etc...). The glm algorithm may not converge due to not enough iterations used in the iteratively re-weighted least squares (IRLS) algorithm. Change maxit=25 (Default) to maxit=100 in R.