colleagues! I have panel data:
Company year Beta NI Sales Export Hedge FL QR AT Foreign
1 1 2010 -2.2052800 293000 1881000 78.6816 0 23.5158 1.289 0.6554 3000
2 1 2011 -2.2536069 316000 2647000 81.4885 0 21.7945 1.1787 0.8282 22000
3 1 2012 0.3258693 363000 2987000 82.4908 0 24.5782 1.2428 0.813 -11000
4 1 2013 0.4006030 549000 4546000 79.4325 0 31.4168 0.6038 0.7905 71000
5 1 2014 -0.4508811 348000 5376000 79.2411 0 37.1451 0.6563 0.661 -64000
6 1 2015 0.1494696 355000 5038000 77.1735 0 33.3852 0.9798 0.5483 37000
But R shows the mistake when I try to use plm package for the regression:
panel <- read.csv("Panel.csv", header=T, sep=";")
p=plm(data=panel,Beta~NI, model="within",index=c("id","year"))
Error in pdim.default(index[[1]], index[[2]]) :
duplicate couples (id-time)
In addition: Warning messages:
1: In pdata.frame(data, index) :
duplicate couples (id-time) in resulting pdata.frame
to find out which, use e.g. table(index(your_pdataframe), useNA = "ifany")
2: In is.pbalanced.default(index[[1]], index[[2]]) :
duplicate couples (id-time)
3: In is.pbalanced.default(index[[1]], index[[2]]) :
duplicate couples (id-time)
I searched this error in the Internet and read that it's connected with the id of company and year. But I did not find the way how to avoid this problem. Also, when I do na.omit(panel), R does not show the error, but it's significant to stay NA data and companies in the data. Please, tell me to do with this problem. Thank you.
Let consider the Produc
dataset in the plm
package.
data("Produc", package = "plm")
head(Produc)
state year region pcap hwy water util pc gsp emp unemp
1 ALABAMA 1970 6 15032.67 7325.80 1655.68 6051.20 35793.80 28418 1010.5 4.7
2 ALABAMA 1971 6 15501.94 7525.94 1721.02 6254.98 37299.91 29375 1021.9 5.2
3 ALABAMA 1972 6 15972.41 7765.42 1764.75 6442.23 38670.30 31303 1072.3 4.7
4 ALABAMA 1973 6 16406.26 7907.66 1742.41 6756.19 40084.01 33430 1135.5 3.9
5 ALABAMA 1974 6 16762.67 8025.52 1734.85 7002.29 42057.31 33749 1169.8 5.5
6 ALABAMA 1975 6 17316.26 8158.23 1752.27 7405.76 43971.71 33604 1155.4 7.7
In this dataset information are collected over time (17 years) and over the same sample units (48 US States).
table(Produc$state, Produc$year)
1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986
ALABAMA 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
ARIZONA 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
ARKANSAS 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
CALIFORNIA 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
...
plm
requires that each (state, year) pair be unique.
any(table(Produc$state, Produc$year)!=1)
[1] FALSE
The command plm
works nicely with this dataset:
plmFit1 <- plm(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp,
data = Produc, index = c("state","year"))
summary(plmFit1)
Oneway (individual) effect Within Model
Call:
plm(formula = log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp,
data = Produc, index = c("state", "year"))
Balanced Panel: n=48, T=17, N=816
Residuals :
Min. 1st Qu. Median 3rd Qu. Max.
-0.12000 -0.02370 -0.00204 0.01810 0.17500
Coefficients :
Estimate Std. Error t-value Pr(>|t|)
log(pcap) -0.02614965 0.02900158 -0.9017 0.3675
log(pc) 0.29200693 0.02511967 11.6246 < 2.2e-16 ***
log(emp) 0.76815947 0.03009174 25.5273 < 2.2e-16 ***
unemp -0.00529774 0.00098873 -5.3582 1.114e-07 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Total Sum of Squares: 18.941
Residual Sum of Squares: 1.1112
R-Squared: 0.94134
Adj. R-Squared: 0.93742
F-statistic: 3064.81 on 4 and 764 DF, p-value: < 2.22e-16
Now we duplicate one of the (state, year) pairs:
Produc[2,2] <- 1970
any(table(Produc$state, Produc$year)>1)
[1] TRUE
and plm
now generates the same error message that you described above:
zz <- plm(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp,
data = Produc, index = c("state","year"))
Error in pdim.default(index[[1]], index[[2]]) :
duplicate couples (id-time)
Inoltre: Warning messages:
1: In pdata.frame(data, index) :
duplicate couples (id-time) in resulting pdata.frame
to find out which, use e.g. table(index(your_pdataframe), useNA = "ifany")
2: In is.pbalanced.default(index[[1]], index[[2]]) :
duplicate couples (id-time)
3: In is.pbalanced.default(index[[1]], index[[2]]) :
duplicate couples (id-time)
Hope this can help you.