I am trying to run a multi-level model on multiply imputed data (created with Amelia); the sample is based on a clustered sample with group = 24, N= 150.
library("ZeligMultilevel")
ML.model.0 <- zelig(dv~1 + tag(1|group), model="ls.mixed",
data=a.out$imputations)
summary(ML.model.0)
This code produces the following error code:
Error in object[[1]]$result$call :
$ operator not defined for this S4 class
If I run a OLS regression, it works:
model.0 <- zelig(dv~1, model="ls", data=a.out$imputations)
m.0 <- coef(summary(model.0))
print(m.0, digits = 2)
Value Std. Error t-stat p-value
[1,] 45 0.34 130 2.6e-285
I am happy to provide a working example.
require(Zelig)
require(Amelia)
require(ZeligMultilevel)
data(freetrade)
length(freetrade$country) #grouping variable
#Imputation of missing data
a.out <- amelia(freetrade, m=5, ts="year", cs="country")
# Models: (1) OLS; (2) multi-level
model.0 <- zelig(polity~1, model="ls", data=a.out$imputations)
m.0 <- coef(summary(model.0))
print(m.0, digits = 2)
ML.model.0 <- zelig(polity~1 + tag(1|country), model="ls.mixed", data=a.out$imputations)
summary(ML.model.0)
I think the issue may be with how Zelig interfaces with Amelia's mi class. Therefore, I turned toward an alternative R package: lme4.
require(lme4)
write.amelia(obj=a.out, file.stem="inmi", format="csv", na="NA")
diff <-list(5) # a list to store each model, 5 is the number of the imputed datasets
for (i in 1:5) {
file.name <- paste("inmi", 5 ,".csv",sep="")
data.to.use <- read.csv(file.name)
diff[[5]] <- lmer(polity ~ 1 + (1 | country),
data = data.to.use)}
diff
The result is the following:
[[1]]
[1] 5
[[2]]
NULL
[[3]]
NULL
[[4]]
NULL
[[5]]
Linear mixed model fit by REML
Formula: polity ~ 1 + (1 | country)
Data: data.to.use
AIC BIC logLik deviance REMLdev
1006 1015 -499.9 1002 999.9
Random effects:
Groups Name Variance Std.Dev.
country (Intercept) 14.609 3.8222
Residual 17.839 4.2236
Number of obs: 171, groups: country, 9
Fixed effects:
Estimate Std. Error t value
(Intercept) 2.878 1.314 2.19
The results remain the same when I replace diff[[5]]
by diff[[4]]
, diff[[3]]
etc. Still, I am wondering whether this is actually the results for the combined dataset or for one single imputed data set. Any thoughts? Thanks!
I modified the summary function for this object (fetched the source and opened up ./R/summary.R file). I added some curly braces to make the code flow and changed a getcoef
to coef
. This should work for this particular case, but I'm not sure if it's general. Function getcoef
searches for slot coef3
, and I have never seen this. Perhaps @BenBolker can throw an eye here? I can't guarantee this is what the result looks like, but the output looks legit to me. Perhaps you could contact the package authors to correct this in the future version.
summary(ML.model.0)
Model: ls.mixed
Number of multiply imputed data sets: 5
Combined results:
Call:
zelig(formula = polity ~ 1 + tag(1 | country), model = "ls.mixed",
data = a.out$imputations)
Coefficients:
Value Std. Error t-stat p-value
[1,] 2.902863 1.311427 2.213515 0.02686218
For combined results from datasets i to j, use summary(x, subset = i:j).
For separate results, use print(summary(x), subset = i:j).
Modified function:
summary.MI <- function (object, subset = NULL, ...) {
if (length(object) == 0) {
stop('Invalid input for "subset"')
} else {
if (length(object) == 1) {
return(summary(object[[1]]))
}
}
# Roman: This function isn't fecthing coefficients robustly. Something goes wrong. Contact package author.
getcoef <- function(obj) {
# S4
if (!isS4(obj)) {
coef(obj)
} else {
if ("coef3" %in% slotNames(obj)) {
obj@coef3
} else {
obj@coef
}
}
}
#
res <- list()
# Get indices
subset <- if (is.null(subset)) {
1:length(object)
} else {
c(subset)
}
# Compute the summary of all objects
for (k in subset) {
res[[k]] <- summary(object[[k]])
}
# Answer
ans <- list(
zelig = object[[1]]$name,
call = object[[1]]$result@call,
all = res
)
#
coef1 <- se1 <- NULL
#
for (k in subset) {
# tmp <- getcoef(res[[k]]) # Roman: I changed this to coef, not 100% sure if the output is the same
tmp <- coef(res[[k]])
coef1 <- cbind(coef1, tmp[, 1])
se1 <- cbind(se1, tmp[, 2])
}
rows <- nrow(coef1)
Q <- apply(coef1, 1, mean)
U <- apply(se1^2, 1, mean)
B <- apply((coef1-Q)^2, 1, sum)/(length(subset)-1)
var <- U+(1+1/length(subset))*B
nu <- (length(subset)-1)*(1+U/((1+1/length(subset))*B))^2
coef.table <- matrix(NA, nrow = rows, ncol = 4)
dimnames(coef.table) <- list(rownames(coef1),
c("Value", "Std. Error", "t-stat", "p-value"))
coef.table[,1] <- Q
coef.table[,2] <- sqrt(var)
coef.table[,3] <- Q/sqrt(var)
coef.table[,4] <- pt(abs(Q/sqrt(var)), df=nu, lower.tail=F)*2
ans$coefficients <- coef.table
ans$cov.scaled <- ans$cov.unscaled <- NULL
for (i in 1:length(ans)) {
if (is.numeric(ans[[i]]) && !names(ans)[i] %in% c("coefficients")) {
tmp <- NULL
for (j in subset) {
r <- res[[j]]
tmp <- cbind(tmp, r[[pmatch(names(ans)[i], names(res[[j]]))]])
}
ans[[i]] <- apply(tmp, 1, mean)
}
}
class(ans) <- "summaryMI"
ans
}