I'm trying to run a regression in R's plm
package with fixed effects and model = 'within'
, while having clustered standard errors. Using the Cigar
dataset from plm
, I'm running:
require(plm)
require(lmtest)
data(Cigar)
model <- plm(price ~ sales + factor(state), model = 'within', data = Cigar)
coeftest(model, vcovHC(model, type = 'HC0', cluster = 'group'))
Estimate Std. Error t value Pr(>|t|)
sales -1.21956 0.21136 -5.7701 9.84e-09
This is (slightly) different than what I'd get by using Stata (having written the Cigar file as a .dta):
use cigar
xtset state year
xtreg price sales, fe vce(cluster state)
price Coef. Std. Err. t P>t [95% Conf. Interval]
sales -1.219563 .2137726 -5.70 0.000 -1.650124 -.7890033
Namely, the standard error and T statistic are different. I've tried rerunning the R code with different "types", but none give the same result as Stata. Am I missing something?
Stata uses a specific small-sample correction that has been implemented in
plm
1.5.Try this:
Which will yield:
This gives the same SE estimate up to 3 digits:
Stata uses a finite sample correction to reduce downwards bias in the errors due to the finite number of clusters. It is a multiplicative factor on the variance-covariance matrix, $c=\frac{G}{G-1} \cdot \frac{N-1}{N-K}$, where G is the number of groups, N is the number of observations, and K is the number of parameters. I think
coeftest
only uses $c'=\frac{N-1}{N-K}$ since if I scale R's standard error by the square of the first term in c, I get something pretty close to Stata's standard error:Here's how I would replicate what Stata is doing in R:
This yields:
which agrees with Stata's error of 0.2137726 and t-stat of -5.70.
This code is probably not ideal, since the number of states in the data may be different than the number of states in the regression, but I am too lazy to figure out how to get the right number of panels.