I did a logistic regression:
EW <- glm(everwrk~age_p + r_maritl, data = NH11, family = "binomial")
Moreover, I want to predict everwrk
for each level of r_maritl
.
r_maritl
has the following levels:
levels(NH11$r_maritl)
"0 Under 14 years"
"1 Married - spouse in household"
"2 Married - spouse not in household"
"3 Married - spouse in household unknown"
"4 Widowed"
"5 Divorced"
"6 Separated"
"7 Never married"
"8 Living with partner"
"9 Unknown marital status"
So I did:
predEW <- with(NH11,
expand.grid(r_maritl = c( "0 Under 14 years", "1 Married -
spouse in household", "2 Married - spouse not in household", "3 Married -
spouse in household unknown", "4 Widowed", "5 Divorced", "6 Separated", "7
Never married", "8 Living with partner", "9 Unknown marital status"),
age_p = mean(age_p,na.rm = TRUE)))
cbind(predEW, predict(EW, type = "response",
se.fit = TRUE, interval = "confidence",
newdata = predEW))
The Problem is I get the following response:
Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) : factor r_maritl has new levels 0 Under 14 years, Married - spouse in household unknown
Sample Data:
str(NH11$age_p)
num [1:33014] 47 18 79 51 43 41 21 20 33 56 ...
str(NH11$everwrk)
Factor w/ 2 levels "2 No","1 Yes": NA NA 2 NA NA NA NA NA 2 2 ...
str(NH11$r_maritl)
Factor w/ 10 levels "0 Under 14 years",..: 6 8 5 7 2 2 8 8 8 2 ...
tl;dr it looks like you have some levels in your factor that are not represented in your data, that get dropped from the factors used in the model. In hindsight this isn't terribly surprising, since you won't be able to predict responses for these levels. That said, it's mildly surprising that R doesn't do something nice for you like generate
NA
values automatically. You can solve this problem by usinglevels(droplevels(NH11$r_maritl))
in constructing your prediction frame, or equivalentlyEW$xlevels$r_maritl
.A reproducible example:
Let's make a missing level:
Fit the model:
Success!