individual random effects model with standard erro

2019-04-11 14:07发布

问题:

I'm currently working on some data from an experiment. Thus, I have data about some individuals who are randomly assigned to 2 different treatments. For each treatment, we ran three sessions. In each session, participants were asked to make a sequence of decisions.

What I would like to do is to: (1) estimate the effect of the treatment with a model that includes random effects on individuals and afterwards, (2) clustering the standard errors by session.

In R, I can easily estimate the random effect model with the plm package:

model.plm<-plm(formula=DependentVar~TreatmentVar+SomeIndependentVars,data=data,
    model="random",effect="individual")

My problem is that I'm not able to cluster the standard errors by the variable session, i.e. the session the individuals participated in. Indeed the Robust Covariance Matrix Estimators of the plm package let me choose between 2 types of clusters: "groups" and "time". So, if I choose the option "group" I get standard errors clustered at the individual level:

vcovHC(model.plm,type="HC0",cluster="group")

Is there a way to choose a different clustering variable ?

I will very much appreciate your help.

回答1:

You may be interested in this: https://stats.stackexchange.com/questions/85909/why-does-a-fixed-effect-ols-need-unique-time-elements

Here's my solution for "within" models:

  #' Fixed effect cluster regression, estimated efficiently using plm()
  #' @param form The model formula.
  #' @param data The data.frame.
  #' @param index Character vector giving the column name indexing individual units.
  #' @param cluster Character vector giving the column name indexing clusters, or "robust" to avoid the bootstrap and just return robust SE.
  #' @param param A list of control parameters, with named elements as follows:  R is the number of bootstrap replicates. 
  #' @return Coefficients plus clustered standard errors
  feClusterRegress <- function( form, data, index, cluster = "robust", param = list( R = 30 ) ) {
    if( "data.table" %in% class(data) )  data <- as.data.frame(data) # Not ideal efficiency-wise since I re-convert it later but necessary until I generalize the code to data.tables (the plm call doesn't work with them, for instance)
    stopifnot( class(form)=="formula" )
    mdl <- plm( form, data = data, model = "within", effect="individual", index = index )
    if( cluster=="robust" ) {
      res <- summary( mdl, robust=TRUE )
    } else { # Bootstrap
      require(foreach)
      require(data.table)
      # Prepare data structures for efficient sampling
      clusters <- unique( data[[cluster]] )
      if( is.null(clusters) )  stop("cluster must describe a column name that exists!")
      clusterList <- lapply( clusters, function(x) which( data[[cluster]] == x ) )
      names(clusterList) <- clusters
      progressBar <- txtProgressBar( 0, param$R )
      # Convert to data.table and drop extraneous variables
      data <- as.data.table( data[ , c( all.vars(form), index ) ] ) # For faster sub-setting
      # Sample repeatedly
      coefList <- foreach( i = seq( param$R ) ) %dopar% {
        setTxtProgressBar( progressBar, i )
        clusterSample <- sample( as.character( clusters ), replace=TRUE )
        indexSample <- unlist( clusterList[ clusterSample ], use.names=FALSE )
        dataSample <- data[ indexSample, ]
        dataSample[ , fakeTime := seq(.N), by = index ] # fakeTime is necessary due to a potential bug in plm.  See https://stats.stackexchange.com/questions/85909/why-does-a-fixed-effect-ols-need-unique-time-elements
        try( coefficients( plm( form, data = as.data.frame(dataSample), model = "within", effect="individual", index = c( index, "fakeTime") ) ) )
      }
      failed <- vapply( coefList, function(x) class(x) == "try-error", FUN.VALUE=NA )
      if( any(failed) ) {
        warning( "Some runs of the regression function failed" )
        coefList <- coefList[ !failed ]
      }
      coefMat <- stack( coefList )
      SE <- apply( coefMat, 2, sd )
      res <- structure( 
        list( 
          cbind( coefficients( mdl ), SE ),
          model = mdl
        ),
        class = "feClusterPLM",
        R = param$R
      )
    }
    res         
  }

I suspect you actually need the variables, so instead of generating a fake time generate a "fake" group--just make up a new group identifier right after you grab each bootstrap sample.