I am learning how the quasi-separation affects R binomial GLM. And I start to think that it does not matter in some circumstance.
In my understanding, we say that the data has quasi separation when some linear combination of factor levels can completely identify failure/non-failure.
So I created an artificial dataset with a quasi separation in R as:
fail <- c(100,100,100,100)
nofail <- c(100,100,0,100)
x1 <- c(1,0,1,0)
x2 <- c(0,0,1,1)
data <- data.frame(fail,nofail,x1,x2)
rownames(data) <- paste("obs",1:4)
Then when x1=1 and x2=1 (obs 3) the data always doesn't fail. In this data, my covariate matrix has three columns: intercept, x1 and x2.
In my understanding, quasi-separation results in estimate of infinite value. So glm fit should fail. However, the following glm fit does NOT fail:
summary(glm(cbind(fail,nofail)~x1+x2,data=data,family=binomial))
The result is:
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.4342 0.1318 -3.294 0.000986 ***
x1 0.8684 0.1660 5.231 1.69e-07 ***
x2 0.8684 0.1660 5.231 1.69e-07 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Std. Error seems very reasonable even with the quasi separation. Could anyone tell me why the quasi separation is NOT affecting the glm fit result?