I'm trying to calculate the variance inflation factor (VIF) for each column in a simple dataset in python:

I have already done this in R using the vif function from the usdm library which gives the following results:

a <- c(1, 1, 2, 3, 4)
b <- c(2, 2, 3, 2, 1)
c <- c(4, 6, 7, 8, 9)
d <- c(4, 3, 4, 5, 4)

df <- data.frame(a, b, c, d)
vif_df <- vif(df)
print(vif_df)

Variables   VIF
   a        22.95
   b        3.00
   c        12.95
   d        3.00

However, when I do the same in python using the statsmodel vif function, my results are:

a = [1, 1, 2, 3, 4]
b = [2, 2, 3, 2, 1]
c = [4, 6, 7, 8, 9]
d = [4, 3, 4, 5, 4]

ck = np.column_stack([a, b, c, d])

vif = [variance_inflation_factor(ck, i) for i in range(ck.shape[1])]
print(vif)

Variables   VIF
   a        47.136986301369774
   b        28.931506849315081
   c        80.31506849315096
   d        40.438356164383549

The results are vastly different, even though the inputs are the same. In general, results from the statsmodel VIF function seem to be wrong, but I'm not sure if this is because of the way I am calling it or if it is an issue with the function itself.

I was hoping someone could help me figure out whether I was incorrectly calling the statsmodel function or explain the discrepancies in the results. If it's an issue with the function then are there any VIF alternatives in python?

标签： python r numpy statistics statsmodels

7条回答

够拽才男人

2楼-- · 2020-02-08 03:35

here code using dataframe python:

To create data

import numpy as np
import scipy as sp

a = [1, 1, 2, 3, 4]
b = [2, 2, 3, 2, 1]
c = [4, 6, 7, 8, 9]
d = [4, 3, 4, 5, 4]

To create dataframe

import pandas as pd
data = pd.DataFrame()
data["a"] = a
data["b"] = b
data["c"] = c
data["d"] = d

Calculate VIF

cc = np.corrcoef(data, rowvar=False)
VIF = np.linalg.inv(cc)
VIF.diagonal()

Result

array([22.95, 3. , 12.95, 3. ])

0人赞添加讨论(0) 举报

上一页 1 2

Variance Inflation Factor in Python

To create data

To create dataframe

Calculate VIF

Result

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间