I'm building a WLS (statsmodels.formula.api.wls
) model using the statsmodels formulas API (from patsy) and I'm using interactions between factors. Some of these are predictive whereas others are not. Is there a way to include only a subset of the interactions in the model without resorting to building a design matrix by hand?
Alternatively, is there a way to constrain the estimated coefficients of a subset of the model variables to be equal to zero?
I don't understand what you mean by "a subset of the interactions". One thing you might mean is a formula like
or the equivalent
where the latter makes it obvious that we're including some of the possible interactions, but not all of them (we've left out
pred2:pred3
).But, this is easy to do, so I'm guessing that what you actually meant may be, you want to include a subset of the coefficients associated with a single interaction? If so, then no, that isn't something that's currently implemented. It's fairly dubious from a statistical perspective as well; if you start leaving out random columns, then you change the interpretation of all the other columns in very difficult to interpret ways. Also I can't really think of a good implementable syntax for describing the partial interaction you want... if you can then feel free to file a feature request on patsy.
Also, I don't believe that statsmodels includes a way to fit a restricted model like that, no. It would be a good feature request.
I'm not sure I understand exactly what you need, but I suggest you start with the truly excellent pasty docs (patsy handles formulas for statsmodels). There's a nice section on categorical data: http://patsy.readthedocs.org/en/latest/index.html
My guess is that what you want is going to be hard to achieve with a single formula call. I would probably just use patsy to build a design matrix with more terms than I need and then drop columns. For example: