I wanted to compute a simple regression using the lm
and plain matrix algebra. However, my regression coefficients obtained from matrix algebra are only half of those obtained from using the lm
and I have no clue why.
Here's the code
boot_example <- data.frame(
x1 = c(1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L),
x2 = c(0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L),
x3 = c(1L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L),
x4 = c(0L, 1L, 0L, 0L, 1L, 0L, 0L, 1L, 0L),
x5 = c(1L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L),
x6 = c(0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 1L),
preference_rating = c(9L, 7L, 5L, 6L, 5L, 6L, 5L, 7L, 6L)
)
dummy_regression <- lm("preference_rating ~ x1+x2+x3+x4+x5+x6", data = boot_example)
dummy_regression
Call:
lm(formula = "preference_rating ~ x1+x2+x3+x4+x5+x6", data = boot_example)
Coefficients:
(Intercept) x1 x2 x3 x4 x5 x6
4.2222 1.0000 -0.3333 1.0000 0.6667 2.3333 1.3333
###The same by matrix algebra
X <- matrix(c(
c(1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L), #upper var
c(0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L), #upper var
c(1L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L), #country var
c(0L, 1L, 0L, 0L, 1L, 0L, 0L, 1L, 0L), #country var
c(1L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L), #price var
c(0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 1L) #price var
),
nrow = 9, ncol=6)
Y <- c(9L, 7L, 5L, 6L, 5L, 6L, 5L, 7L, 6L)
#Using standardized (mean=0, std=1) "z" -transformation Z = (X-mean(X))/sd(X) for all predictors
X_std <- apply(X, MARGIN = 2, FUN = function(x){(x-mean(x))/sd(x)})
##If constant shall be computed as well, uncomment next line
#X_std <- cbind(c(rep(1,9)),X_std)
#using matrix algebra formula
solve(t(X_std) %*% X_std) %*% (t(X_std) %*% Y)
[,1]
[1,] 0.5000000
[2,] -0.1666667
[3,] 0.5000000
[4,] 0.3333333
[5,] 1.1666667
[6,] 0.6666667
Does anyone see the error in my matrix computation?
Thank you in advance!