I am trying to perform linear regression, for a model like this:
Y = aX1 + bX2 + c
So, Y ~ X1 + X2
Suppose I have the following response vector:
set.seed(1)
Y <- runif(100, -1.0, 1.0)
And the following matrix of predictors:
X1 <- runif(100, 0.4, 1.0)
X2 <- sample(rep(0:1,each=50))
X <- cbind(X1, X2)
I want to use the following constraints on the coefficients:
a + c >= 0
c >= 0
So no constraint on b.
I know that the glmc package can be used to apply constraints, but I was not able to determine how to apply it for my constraints. I also know that contr.sum can be used so that all coefficients sum to 0, for example, but that is not what I want to do. solve.QP() seems like another possibility, where setting meq=0
can be used so that all coefficients are >=0 (again, not my goal here).
Note: The solution must be able to handle NA values in the response vector Y, for example with:
Y <- runif(100, -1.0, 1.0)
Y[c(2,5,17,56,37,56,34,78)] <- NA
solve.QP
can be passed arbitrary linear constraints, so it can certainly be used to model your constraintsa+c >= 0
andc >= 0
.First, we can add a column of 1's to
X
to capture the intercept term, and then we can replicate standard linear regression withsolve.QP
:With the sample data from the question, neither constraint is met using standard linear regression.
By modifying both the
Amat
andbvec
parameters, we can add our two constraints:Subject to these constraints, the squared residuals are minimized by setting the a and c coefficients to both equal 0.
You can handle missing values in
Y
orX2
just as thelm
function does, by removing the offending observations. You might do something like the following as a pre-processing step: