I have a logistic regression model in R, where all of the predictor variables are categorical rather than continuous (in addition to the response variable, which is also obviously categorical/binary).
When calling summary(model_name)
, is there a way to include a column representing the number of observations within each factor level?
If all your covariates are factors (not including the intercept), this is fairly easy as the model matrix only contains 0 and 1 and the number of 1 indicates the occurrence of that factor level (or interaction level) in your data. So just do
colSums(model.matrix(your_glm_model_object))
.Since a model matrix has column names,
colSums
will give you a vector with "names" attribute, that is consistent with the "names" field ofcoef(your_glm_model_object)
.The same solution applies to a linear model (by
lm
) and a generalized linear model (byglm
) for any distribution family.Here is a quick example:
Here, we have 100 observations / complete-cases (indicated under
(Intercept)
).Baseline levels are contrasted, so they don't appear in the the model matrix used for fitting. However, we can generate the full model matrix (without contrasts) from your formula not your fitted model (this also offers you a way to drop numeric variables if you have them in your model):
Note that it can quickly become tedious in setting contrasts when you have many factor variables.
model.matrix
is definitely not the only approach for this. The conventional way may bebut could get tedious too when your model become complicated.