Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed last year.
I would like to make a Oaxaca Decomposition in R. It is used in e.g. labor economics to distinguish explained variance versus unexplained variance, I believe. I have not been able to find a suitable solution in R, and I am rather reluctant to create one myself (I would probably mess it up).
Anyway, the procedure is briefly explained here:
http://en.wikipedia.org/wiki/Ronald_Oaxaca
Stata is blessed with a rather good package for this, but Stata is not easily available to me.
www.stata.com/meeting/5german/SINNING_stata_presentation.pdf
Please note: I have also posted a message on R-help but it has gotten no reply. I hope it is okay to post on this list as well.
Thanks in advance, Rasmus
Edit: I have made the following function, which seems to yield wrong answers (urgh). I tried to follow the Stata link above but it did not work out as I hoped :)
oaxaca <- function (fsex,frace1,frace2) {
## First we make regresions
data1 <- subset(l2,sex==fsex & race==frace1)
data2 <- subset(l2,sex==fsex & race==frace2)
mindata1 <- subset(cbind(grade,exp,I(exp^2)),sex==fsex & race==frace1)
mindata2 <- subset(cbind(grade,exp,I(exp^2)),sex==fsex & race==frace2)
reg1 <- lm(log(wage)~grade+exp+I(exp^2), data=data1)
reg2 <- lm(log(wage)~grade+exp+I(exp^2), data=data2)
## DECOMPOSITION
################
## Variables
gap <- mean(log(wage[race==frace1 & sex==fsex]))-mean(log(wage[race==frace2 & sex==fsex]))
mean1 <- colMeans(mindata1)
mean2 <- colMeans(mindata2)
beta1 <- summary(reg1)$coefficients[,1]
beta2 <- summary(reg2)$coefficients[,1]
beta1incep <- summary(reg1)$coefficients[1,1]
beta2incep <- summary(reg2)$coefficients[1,1]
beta1coef <- summary(reg1)$coefficients[c(2,3,4),1]
beta2coef <- summary(reg2)$coefficients[c(2,3,4),1]
betastar <- .5*(beta1coef+beta2coef)
betastar2 <- (beta1+beta2)/2
expl <- sum((mean1-mean2)*beta1coef)
uexpl <- sum(mean2*(beta2coef-beta1coef))
pct=expl/gap
pct2=uexpl/gap
## output
out <- data.frame(Gap=gap,
Explained=expl,
Unexplained=uexpl,
Pct=pct*100)
return(out)
}