I need to run a regression on a panel data . It has 3 dimensions (Year * Company * Country). For example:
============================================
year | comp | count | value.x | value.y
------+------+-------+----------+-----------
2000 | A | USA | 1029.0 | 239481
------+------+-------+----------+-----------
2000 | A | CAN | 2341.4 | 129333
------+------+-------+----------+-----------
2000 | B | USA | 2847.7 | 187319
------+------+-------+----------+-----------
2000 | B | CAN | 4820.5 | 392039
------+------+-------+----------+-----------
2001 | A | USA | 7289.9 | 429481
------+------+-------+----------+-----------
2001 | A | CAN | 5067.3 | 589143
------+------+-------+----------+-----------
2001 | B | USA | 7847.8 | 958234
------+------+-------+----------+-----------
2001 | B | CAN | 9820.0 | 1029385
============================================
However, the R package plm
seems not able to cope with more than 2 dimension.
I have tried
result <- plm(value.y ~ value.x, data = dataname, index = c("comp","count","year"))
and it returns error:
Error in pdata.frame(data, index) :
'index' can be of length 2 at the most (one individual and one time index)
How do you run regressions when the panel data (individual * time) has more than 1 dimension within "individual"?
In case anyone encounters the same situation, I'll put my solutions here:
R seems unable to cope with this situation. And the only thing you can do is to add dummies. If the categorical variables according to which you add dummies contains too much categories, you can try this:
makedummy <- function(colnum,data,interaction = FALSE,interation_varnum)
{
char0 = colnames(data)[colnum]
char1 = "dummy"
tmp = unique(data[,colnum])
valname = paste(char0,char1,tmp,sep = ".")
valname_int = paste(char0,char1,"int",tmp,sep = ".")
for(i in 1:(length(tmp)-1))
{
if(!interaction)
{
tmp_dummy <- ifelse(data[,colnum]==tmp[i],1,0)
}
if(interaction)
{
index = apply(as.matrix(data[,colnum]),1,identical,y = tmp[i])
tmp_dummy = c()
tmp_dummy[index] = data[index,interation_varnum]
tmp_dummy[!index] = 0
}
tmp_dummy <- data.frame(tmp_dummy)
if(!interaction)
{
colnames(tmp_dummy) <- valname[i]
}
if(interaction)
{
colnames(tmp_dummy) <- valname_int[i]
}
data<-cbind(data,tmp_dummy)
}
return(data)
}
for example:
## Create fake data
fakedata <- matrix(rnorm(300),nrow = 100)
cate <- LETTERS[sample(seq(1,10),100, replace = TRUE)]
fakedata <- cbind.data.frame(cate,fakedata)
## Try this
fakedata <- makedummy(1,fakedata)
## If you need to add dummy*x to see if there is any influences of different categories on the coefficients, try this
fakedata <- makedummy(1,fakedata,interaction = TRUE,interaction_varnum = 2)
Maybe a little bit verbose here, I didn't polish it. Any advice is welcome. Now you can perform OLS on your data.