python stats models - quadratic term in regression

2019-02-04 18:21发布

I have the following linear regression:

import statsmodels.formula.api as sm

model = sm.ols(formula = 'a ~ b + c', data = data).fit()

I want to add a quadratic term for b in this model.

Is there a simple way to do this with statsmodels.ols? Is there a better package I should be using to achieve this?

3条回答
够拽才男人
2楼-- · 2019-02-04 18:37

Although the solution by Alexander is working, in some situations it is not very convenient. For example, each time you want to predict the outcome of the model for new values, you need to remember to pass both b**2 and b values which is cumbersome and should not be necessary. Although patsy does not recognize the notation "b**2", it does recognize numpy functions. Thus, you can use

import statsmodels.formula.api as sm
import numpy as np

data = {"a":[2, 3, 5], "b":[2, 3, 5], "c":[2, 3, 5]}
model = sm.ols(formula = 'a ~ np.power(b, 2) + b + c', data = data).fit()

In this way, latter, you can reuse this model without the need to specify a value for b**2

model.predict({"a":[1, 2], "b":[5, 2], "c":[2, 4]})
查看更多
放我归山
3楼-- · 2019-02-04 18:53

The simplest way is

model = sm.ols(formula = 'a ~ b + c + I(b**2)', data = data).fit()

The I(...) basically says "patsy, please stop being clever here and just let Python handle everything inside kthx". (More detailed explanation)

查看更多
兄弟一词,经得起流年.
4楼-- · 2019-02-04 18:55

This should work:

data['b2'] = data.b ** 2
model = sm.ols(formula = 'a ~ b2 + b + c', data=data).fit()
查看更多
登录 后发表回答