How to run regressions on multidimensional panel d

2020-05-29 06:19发布

问题:

I need to run a regression on a panel data . It has 3 dimensions (Year * Company * Country). For example:

============================================
 year | comp | count |  value.x |  value.y
------+------+-------+----------+-----------
 2000 |   A  |  USA  |  1029.0  |  239481   
------+------+-------+----------+-----------
 2000 |   A  |  CAN  |  2341.4  |  129333   
------+------+-------+----------+-----------
 2000 |   B  |  USA  |  2847.7  |  187319   
------+------+-------+----------+-----------
 2000 |   B  |  CAN  |  4820.5  |  392039
------+------+-------+----------+-----------
 2001 |   A  |  USA  |  7289.9  |  429481
------+------+-------+----------+-----------
 2001 |   A  |  CAN  |  5067.3  |  589143
------+------+-------+----------+-----------
 2001 |   B  |  USA  |  7847.8  |  958234
------+------+-------+----------+-----------
 2001 |   B  |  CAN  |  9820.0  | 1029385
============================================

However, the R package plm seems not able to cope with more than 2 dimension.

I have tried

result <- plm(value.y ~ value.x, data = dataname, index = c("comp","count","year"))

and it returns error:

Error in pdata.frame(data, index) : 
'index' can be of length 2 at the most (one individual and one time index)

How do you run regressions when the panel data (individual * time) has more than 1 dimension within "individual"?


In case anyone encounters the same situation, I'll put my solutions here:

R seems unable to cope with this situation. And the only thing you can do is to add dummies. If the categorical variables according to which you add dummies contains too much categories, you can try this:

makedummy <- function(colnum,data,interaction = FALSE,interation_varnum)
{
  char0 = colnames(data)[colnum]
  char1 = "dummy"
  tmp = unique(data[,colnum])
  valname = paste(char0,char1,tmp,sep = ".")
  valname_int = paste(char0,char1,"int",tmp,sep = ".")
  for(i in 1:(length(tmp)-1))
  {
    if(!interaction)
    {
      tmp_dummy <- ifelse(data[,colnum]==tmp[i],1,0)
    }
    if(interaction)
    {
      index = apply(as.matrix(data[,colnum]),1,identical,y = tmp[i])
      tmp_dummy = c()
      tmp_dummy[index] = data[index,interation_varnum]
      tmp_dummy[!index] = 0
    }
    tmp_dummy <- data.frame(tmp_dummy)
    if(!interaction)
    {
      colnames(tmp_dummy) <- valname[i]
    }
    if(interaction)
    {
      colnames(tmp_dummy) <- valname_int[i]
    }
    data<-cbind(data,tmp_dummy)
  }
  return(data)
}

for example:

## Create fake data
fakedata <- matrix(rnorm(300),nrow = 100)
cate <- LETTERS[sample(seq(1,10),100, replace = TRUE)]
fakedata <- cbind.data.frame(cate,fakedata)

## Try this
fakedata <- makedummy(1,fakedata)

## If you need to add dummy*x to see if there is any influences of different categories on the coefficients, try this
fakedata <- makedummy(1,fakedata,interaction = TRUE,interaction_varnum = 2)

Maybe a little bit verbose here, I didn't polish it. Any advice is welcome. Now you can perform OLS on your data.

回答1:

This question is much like these:

  • fixed effects in R: plm vs lm + factor()
  • Fixed Effects plm package R - multiple observations per year/id

You may not want to create a new dummy, then with dplyr package you can use the group_indices function. Although it do not support mutate, the following approach is straightforward:

fakedata$id <- fakedata %>% group_indices(comp, count)

The id variable will be your first panel dimension. So, you need to set the plm index argument to index = c("id", "year").

For alternatives you can take a look at this question: R create ID within a group.



回答2:

If you want to control for another dimension in a within model, simply add a dummy for it:

plm(value.y ~ value.x + count, data = dataname, index = c("comp","year"))

Alternatively (especially for high-dimensional data), look at the lfe package which can 'absorb' the additional dimension so the summary output is not polluted by the dummy variable.



回答3:

I think you can also do:

df <-transform(df, ID = as.numeric(interaction(comp, count, drop=TRUE))) 

And then estimate

result <- plm(value.y ~ value.x, data = df, index = ("ID","year"))


回答4:

I think you want to use lm() instead of plm(). This blog post here discusses what you're after:

https://www.r-bloggers.com/r-tutorial-series-multiple-linear-regression/

for your example I'd imagine it would look something like the following:

lm(formula = comp ~ count + year, data = dataname)