Computing p.values in spatial econometric models:

2019-05-07 04:04发布

问题:

I am estimating some spatial econometric models that contain both a spatial autoregressive term rho and a spatial error term lambda. In attempting to communicate my results I was using the texreg package, which accepts the sacsarlm models I am working with. I noticed, however, that texreg is printing p-values for the rho and lambda parameters that are identical. Texreg appears to be returning the p-value found in the model@LR1$p.value slot of the model object.

The parameters rho and lambda are different in magnitude and have different standard errors so they should not have equivalent p-values. If I call summary on the model object I get unique p-values, but cannot figure out where those values are stored within the model object despite going through each element in a str(model) call.

My question is twofold:

  1. Am I correct in thinking this is an error in the texreg (and screenreg etc.) function or am I erring in my interpretation?
  2. How do I compute the correct p-value or find it in the model object (I am writing a new extract function for texreg and need to find the correct value)?

Below is a minimal example that shows the problem:

library(spdep)
library(texreg)
set.seed(42)
W.ran <- matrix(rbinom(100*100, 1, .3),nrow=100)
X <- rnorm(100)
Y <- .2 * X + rnorm(100) + .9*(W.ran %*% X)

W.test <- mat2listw(W.ran)
model <- sacsarlm(Y~X, type = "sacmixed",
                listw=W.test, zero.policy=TRUE)
summary(model)

Call:sacsarlm(formula = Y ~ X, listw = W.test, type = "sacmixed", 
    zero.policy = TRUE)

Residuals:
      Min        1Q    Median        3Q       Max 
-2.379283 -0.750922  0.036044  0.675951  2.577148 

Type: sacmixed 
Coefficients: (asymptotic standard errors) 
                   Estimate  Std. Error z value Pr(>|z|)
(Intercept)      0.91037455  0.65700059  1.3857   0.1659
X               -0.00076362  0.10330510 -0.0074   0.9941
lag.(Intercept) -0.03193863  0.02310075 -1.3826   0.1668
lag.X            0.89764491  0.02231353 40.2287   <2e-16

Rho: -0.0028541
Asymptotic standard error: 0.0059647
    z-value: -0.47849, p-value: 0.6323
Lambda: -0.020578
Asymptotic standard error: 0.020057
    z-value: -1.026, p-value: 0.3049

LR test value: 288.74, p-value: < 2.22e-16

Log likelihood: -145.4423 for sacmixed model
ML residual variance (sigma squared): 1.0851, (sigma: 1.0417)
Number of observations: 100 
Number of parameters estimated: 7 
AIC: 304.88, (AIC for lm: 585.63)

screenreg(model)

=================================
                      Model 1    
---------------------------------
(Intercept)              0.91    
                        (0.66)   
X                       -0.00    
                        (0.10)   
lag.(Intercept)         -0.03    
                        (0.02)   
lag.X                    0.90 ***
                        (0.02)   
---------------------------------
Num. obs.              100       
Parameters               7       
AIC (Linear model)     585.63    
AIC (Spatial model)    304.88    
Log Likelihood        -145.44    
Wald test: statistic     1.05    
Wald test: p-value       0.90    
Lambda: statistic       -0.02    
Lambda: p-value          0.00    
Rho: statistic          -0.00    
Rho: p-value             0.00    
=================================
*** p < 0.001, ** p < 0.01, * p < 0.05

Obviously, in the example, Rho and Lambda have different p-values neither of which are zero and thus there is a problem with the texreg output. Any help with why this is occurring or where to obtain the correct p-values much appreciated!

回答1:

texreg author here. Thanks for catching this. As described in my reply here, texreg uses extract methods to retrieve the relevant information from any of the (currently more than 70 supported) model object types. It looks like there is a glitch in the GOF part of the method for sarlm objects.

Here is what the method currently looks like (as of texreg 1.36.13):

# extension for sarlm objects (spdep package)
extract.sarlm <- function(model, include.nobs = TRUE, include.aic = TRUE, 
    include.loglik = TRUE, include.wald = TRUE, include.lambda = TRUE, 
    include.rho = TRUE, ...) {
  s <- summary(model, ...)

  names <- rownames(s$Coef)
  cf <- s$Coef[, 1]
  se <- s$Coef[, 2]
  p <- s$Coef[, ncol(s$Coef)]

  gof <- numeric()
  gof.names <- character()
  gof.decimal <- logical()

  if (include.nobs == TRUE) {
    n <- length(s$fitted.values)
    param <- s$parameters
    gof <- c(gof, n, param)
    gof.names <- c(gof.names, "Num.\ obs.", "Parameters")
    gof.decimal <- c(gof.decimal, FALSE, FALSE)
  }
  if (include.aic == TRUE) {
    aic <- AIC(model)
    aiclm <- s$AIC_lm.model
    gof <- c(gof, aiclm, aic)
    gof.names <- c(gof.names, "AIC (Linear model)", "AIC (Spatial model)")
    gof.decimal <- c(gof.decimal, TRUE, TRUE)
  }
  if (include.loglik == TRUE) {
    ll <- s$LL
    gof <- c(gof, ll)
    gof.names <- c(gof.names, "Log Likelihood")
    gof.decimal <- c(gof.decimal, TRUE)
  }
  if (include.wald == TRUE) {
    waldstat <- s$Wald1$statistic
    waldp <- s$Wald1$p.value
    gof <- c(gof, waldstat, waldp)
    gof.names <- c(gof.names, "Wald test: statistic", "Wald test: p-value")
    gof.decimal <- c(gof.decimal, TRUE, TRUE)
  }
  if (include.lambda == TRUE && !is.null(s$lambda)) {
    lambda <- s$lambda
    LRpval <- s$LR1$p.value[1]
    gof <- c(gof, lambda, LRpval)
    gof.names <- c(gof.names, "Lambda: statistic", "Lambda: p-value")
    gof.decimal <- c(gof.decimal, TRUE, TRUE)
  }
  if (include.rho == TRUE && !is.null(s$rho)) {
    rho <- s$rho
    LRpval <- s$LR1$p.value[1]
    gof <- c(gof, rho, LRpval)
    gof.names <- c(gof.names, "Rho: statistic", "Rho: p-value")
    gof.decimal <- c(gof.decimal, TRUE, TRUE)
  }

  tr <- createTexreg(
      coef.names = names, 
      coef = cf, 
      se = se, 
      pvalues = p, 
      gof.names = gof.names, 
      gof = gof, 
      gof.decimal = gof.decimal
  )
  return(tr)
}

setMethod("extract", signature = className("sarlm", "spdep"), 
    definition = extract.sarlm)

I think you are right that the lambda and rho parts need some updating. The sacsarlm function does not store the results printed by its summary method in any object, so you are rightly pointing out that any attempts using str do not appear to show the true p-values etc.

Therefore it makes sense to take a look at what the print.summary.sarlm function in the spdep package actually does when it prints the summary. I found the code for this function in the file R/summary.spsarlm.R in the source code of the package on CRAN. It looks like this:

print.summary.sarlm <- function(x, digits = max(5, .Options$digits - 3),
    signif.stars = FALSE, ...)
{
    cat("\nCall:", deparse(x$call), sep = "", fill=TRUE)
       if (x$type == "error") if (isTRUE(all.equal(x$lambda, x$interval[1])) ||
            isTRUE(all.equal(x$lambda, x$interval[2]))) 
            warning("lambda on interval bound - results should not be used")
       if (x$type == "lag" || x$type == "mixed")
            if (isTRUE(all.equal(x$rho, x$interval[1])) ||
            isTRUE(all.equal(x$rho, x$interval[2]))) 
            warning("rho on interval bound - results should not be used")
    cat("\nResiduals:\n")
    resid <- residuals(x)
    nam <- c("Min", "1Q", "Median", "3Q", "Max")
    rq <- if (length(dim(resid)) == 2L) 
        structure(apply(t(resid), 1, quantile), dimnames = list(nam, 
            dimnames(resid)[[2]]))
    else structure(quantile(resid), names = nam)
    print(rq, digits = digits, ...)
    cat("\nType:", x$type, "\n")
    if (x$zero.policy) {
        zero.regs <- attr(x, "zero.regs")
        if (!is.null(zero.regs))
            cat("Regions with no neighbours included:\n",
            zero.regs, "\n")
    }
        if (!is.null(x$coeftitle)) {
        cat("Coefficients:", x$coeftitle, "\n")
        coefs <- x$Coef
        if (!is.null(aliased <- x$aliased) && any(x$aliased)){
        cat("    (", table(aliased)["TRUE"], 
            " not defined because of singularities)\n", sep = "")
        cn <- names(aliased)
        coefs <- matrix(NA, length(aliased), 4, dimnames = list(cn, 
                    colnames(x$Coef)))
                coefs[!aliased, ] <- x$Coef
        }
        printCoefmat(coefs, signif.stars=signif.stars, digits=digits,
        na.print="NA")
    }
#   res <- LR.sarlm(x, x$lm.model)
    res <- x$LR1
        pref <- ifelse(x$ase, "Asymptotic", "Approximate (numerical Hessian)")
    if (x$type == "error") {
        cat("\nLambda: ", format(signif(x$lambda, digits)),
            ", LR test value: ", format(signif(res$statistic,
                        digits)), ", p-value: ", format.pval(res$p.value,
                        digits), "\n", sep="")
        if (!is.null(x$lambda.se)) {
                    if (!is.null(x$adj.se)) {
                        x$lambda.se <- sqrt((x$lambda.se^2)*x$adj.se)   
                    }
            cat(pref, " standard error: ", 
                format(signif(x$lambda.se, digits)),
            ifelse(is.null(x$adj.se), "\n    z-value: ",
                               "\n    t-value: "), format(signif((x$lambda/
                x$lambda.se), digits)),
            ", p-value: ", format.pval(2*(1-pnorm(abs(x$lambda/
                x$lambda.se))), digits), "\n", sep="")
            cat("Wald statistic: ", format(signif(x$Wald1$statistic, 
            digits)), ", p-value: ", format.pval(x$Wald1$p.value, 
            digits), "\n", sep="")
        }
    } else if (x$type == "sac" || x$type == "sacmixed") {
        cat("\nRho: ", format(signif(x$rho, digits)), "\n",
                    sep="")
                if (!is.null(x$rho.se)) {
                    if (!is.null(x$adj.se)) {
                        x$rho.se <- sqrt((x$rho.se^2)*x$adj.se)   
                    }
          cat(pref, " standard error: ", 
            format(signif(x$rho.se, digits)), 
                        ifelse(is.null(x$adj.se), "\n    z-value: ",
                               "\n    t-value: "), 
            format(signif((x$rho/x$rho.se), digits)),
            ", p-value: ", format.pval(2 * (1 - pnorm(abs(x$rho/
                x$rho.se))), digits), "\n", sep="")
                }
        cat("Lambda: ", format(signif(x$lambda, digits)), "\n", sep="")
        if (!is.null(x$lambda.se)) {
                    pref <- ifelse(x$ase, "Asymptotic",
                        "Approximate (numerical Hessian)")
                    if (!is.null(x$adj.se)) {
                        x$lambda.se <- sqrt((x$lambda.se^2)*x$adj.se)   
                    }
            cat(pref, " standard error: ", 
                format(signif(x$lambda.se, digits)),
            ifelse(is.null(x$adj.se), "\n    z-value: ",
                               "\n    t-value: "), format(signif((x$lambda/
                x$lambda.se), digits)),
            ", p-value: ", format.pval(2*(1-pnorm(abs(x$lambda/
                x$lambda.se))), digits), "\n", sep="")
                }
                cat("\nLR test value: ", format(signif(res$statistic, digits)),
            ", p-value: ", format.pval(res$p.value, digits), "\n",
                    sep="")
        } else {
        cat("\nRho: ", format(signif(x$rho, digits)), 
                    ", LR test value: ", format(signif(res$statistic, digits)),
            ", p-value: ", format.pval(res$p.value, digits), "\n",
                    sep="")
                if (!is.null(x$rho.se)) {
                    if (!is.null(x$adj.se)) {
                        x$rho.se <- sqrt((x$rho.se^2)*x$adj.se)   
                    }
          cat(pref, " standard error: ", 
            format(signif(x$rho.se, digits)), 
                        ifelse(is.null(x$adj.se), "\n    z-value: ",
                               "\n    t-value: "), 
            format(signif((x$rho/x$rho.se), digits)),
            ", p-value: ", format.pval(2 * (1 - pnorm(abs(x$rho/
                x$rho.se))), digits), "\n", sep="")
                }
        if (!is.null(x$Wald1)) {
            cat("Wald statistic: ", format(signif(x$Wald1$statistic, 
            digits)), ", p-value: ", format.pval(x$Wald1$p.value, 
            digits), "\n", sep="")
        }

    }
    cat("\nLog likelihood:", logLik(x), "for", x$type, "model\n")
    cat("ML residual variance (sigma squared): ", 
        format(signif(x$s2, digits)), ", (sigma: ", 
        format(signif(sqrt(x$s2), digits)), ")\n", sep="")
        if (!is.null(x$NK)) cat("Nagelkerke pseudo-R-squared:",
            format(signif(x$NK, digits)), "\n")
    cat("Number of observations:", length(x$residuals), "\n")
    cat("Number of parameters estimated:", x$parameters, "\n")
    cat("AIC: ", format(signif(AIC(x), digits)), ", (AIC for lm: ",
        format(signif(x$AIC_lm.model, digits)), ")\n", sep="")
    if (x$type == "error") {
        if (!is.null(x$Haus)) {
            cat("Hausman test: ", format(signif(x$Haus$statistic, 
            digits)), ", df: ", format(x$Haus$parameter),
                        ", p-value: ", format.pval(x$Haus$p.value, digits),
                        "\n", sep="")
        }
        }
    if ((x$type == "lag" || x$type ==  "mixed") && x$ase) {
        cat("LM test for residual autocorrelation\n")
        cat("test value: ", format(signif(x$LMtest, digits)),
            ", p-value: ", format.pval((1 - pchisq(x$LMtest, 1)), 
            digits), "\n", sep="")
    }
        if (x$type != "error" && !is.null(x$LLCoef)) {
        cat("\nCoefficients: (log likelihood/likelihood ratio)\n")
        printCoefmat(x$LLCoef, signif.stars=signif.stars,
            digits=digits, na.print="NA")
        }
        correl <- x$correlation
        if (!is.null(correl)) {
            p <- NCOL(correl)
            if (p > 1) {
                    cat("\n", x$correltext, "\n")
                    correl <- format(round(correl, 2), nsmall = 2, 
                    digits = digits)
                    correl[!lower.tri(correl)] <- ""
                    print(correl[-1, -p, drop = FALSE], quote = FALSE)
                }
        }
        cat("\n")
        invisible(x)
}

You can see there that the function first distinguishes between different sub-models (error vs. sac/sacmixed vs. else), then decides which standard errors to use, and then computes the p-values on the fly without saving them anywhere.

So this is what we need to do as well in our extract method in order to get the same result as the summary method in the spdep package. We also need to move this from the GOF block up to the coefficient block of the table (see comments section below for a discussion). Here is my attempt at adopting their approach in the extract method:

# extension for sarlm objects (spdep package)
extract.sarlm <- function(model, include.nobs = TRUE, include.loglik = TRUE,
    include.aic = TRUE, include.lr = TRUE, include.wald = TRUE, ...) {
  s <- summary(model, ...)

  names <- rownames(s$Coef)
  cf <- s$Coef[, 1]
  se <- s$Coef[, 2]
  p <- s$Coef[, ncol(s$Coef)]

  if (model$type != "error") {  # include coefficient for autocorrelation term
    rho <- model$rho
    cf <- c(cf, rho)
    names <- c(names, "$\\rho$")
    if (!is.null(model$rho.se)) {
      if (!is.null(model$adj.se)) {
        rho.se <- sqrt((model$rho.se^2) * model$adj.se)   
      } else {
        rho.se <- model$rho.se
      }
      rho.pval <- 2 * (1 - pnorm(abs(rho / rho.se)))
      se <- c(se, rho.se)
      p <- c(p, rho.pval)
    } else {
      se <- c(se, NA)
      p <- c(p, NA)
    }
  }

  if (!is.null(model$lambda)) {
    cf <-c(cf, model$lambda)
    names <- c(names, "$\\lambda$")  
    if (!is.null(model$lambda.se)) {
      if (!is.null(model$adj.se)) {
        lambda.se <- sqrt((model$lambda.se^2) * model$adj.se)   
      } else {
        lambda.se <- model$lambda.se
      }
      lambda.pval <- 2 * (1 - pnorm(abs(model$lambda / lambda.se)))
      se <- c(se, lambda.se)
      p <- c(p, lambda.pval)
    } else {
      se <- c(se, NA)
      p <- c(p, NA)
    }
  }

  gof <- numeric()
  gof.names <- character()
  gof.decimal <- logical()

  if (include.nobs == TRUE) {
    n <- length(s$fitted.values)
    param <- s$parameters
    gof <- c(gof, n, param)
    gof.names <- c(gof.names, "Num.\ obs.", "Parameters")
    gof.decimal <- c(gof.decimal, FALSE, FALSE)
  }
  if (include.loglik == TRUE) {
    ll <- s$LL
    gof <- c(gof, ll)
    gof.names <- c(gof.names, "Log Likelihood")
    gof.decimal <- c(gof.decimal, TRUE)
  }
  if (include.aic == TRUE) {
    aic <- AIC(model)
    aiclm <- s$AIC_lm.model
    gof <- c(gof, aiclm, aic)
    gof.names <- c(gof.names, "AIC (Linear model)", "AIC (Spatial model)")
    gof.decimal <- c(gof.decimal, TRUE, TRUE)
  }
  if (include.lr == TRUE && !is.null(s$LR1)) {
    gof <- c(gof, s$LR1$statistic[[1]], s$LR1$p.value[[1]])
    gof.names <- c(gof.names, "LR test: statistic", "LR test: p-value")
    gof.decimal <- c(gof.decimal, TRUE, TRUE)
  }
  if (include.wald == TRUE && !is.null(model$Wald1)) {
    waldstat <- model$Wald1$statistic
    waldp <- model$Wald1$p.value
    gof <- c(gof, waldstat, waldp)
    gof.names <- c(gof.names, "Wald test: statistic", "Wald test: p-value")
    gof.decimal <- c(gof.decimal, TRUE, TRUE)
  }

  tr <- createTexreg(
      coef.names = names, 
      coef = cf, 
      se = se, 
      pvalues = p, 
      gof.names = gof.names, 
      gof = gof, 
      gof.decimal = gof.decimal
  )
  return(tr)
}

setMethod("extract", signature = className("sarlm", "spdep"), 
    definition = extract.sarlm)

You can just execute this code at run-time to update the way texreg handles these objects. Please do let me know if you still think there are any glitches I haven't spotted. If none are reported in the comments, I will include this updated extract method in the next texreg release.

With these changes, calling screenreg(model, single.row = TRUE) yields the following output:

=======================================
                     Model 1           
---------------------------------------
(Intercept)             0.91 (0.66)    
X                      -0.00 (0.10)    
lag.(Intercept)        -0.03 (0.02)    
lag.X                   0.90 (0.02) ***
rho                    -0.00 (0.01)    
lambda                 -0.02 (0.02)    
---------------------------------------
Num. obs.             100              
Parameters              7              
Log Likelihood       -145.44           
AIC (Linear model)    585.63           
AIC (Spatial model)   304.88           
LR test: statistic    288.74           
LR test: p-value        0.00           
=======================================
*** p < 0.001, ** p < 0.01, * p < 0.05