Interpreting Alias table testing multicollinearity

2020-06-23 05:11发布

问题:

Could someone help me interpret the alias function output for testing for multicollinearity in a multiple regression model. I know some predictor variables in my model are highly correlated, and I want to identify them using the alias table.

Model :
Score ~ Comments + Pros + Cons + Advice + Response + Value + Recommendation 
+ 6Months + 12Months + 2Years + 3Years + Daily + Weekly + Monthly

Complete :
            (Intercept) Comments Pros Cons Advice Response Value1
UseMonthly1      0           0    0    0    0      0          0                
             Recommendation1 6Months1 12Months1 2Years1
UseMonthly1   0               1        1       1             
             3Years1 Daily1 Weekly1
UseMonthly1  1         -1        -1    

Value, Recommendation, 6Months, 12Months, 2Years, 3Years, Daily, Weekly, and Monthly are binary categorical variables.
Score, Comments, Pros, Cons, Advice, and Response are numeric variables.

Can I assume UseMonthly is highly correlated with 6Months, 12Months, 2Years, 3Years, Daily, Weekly? What is the difference between the 1 and -1 values in the alias output? Is it positive and negative correlation?

回答1:

Nonzero entries in the "complete" matrix show that those terms are linearly dependent on UseMonthly. This means they're highly correlated, but terms can be highly correlated without being linearly dependent.

If your purpose is to identify and remove correlated variables, you should remove UseMonthly, but you'll probably also want to remove others as well. A common way to identify variables which can be problematic with respect to multicollinearity is to search for large variance inflation factors (calculated by e.g. car::vif).