During the recent class presentations, I couldn't stop to notice that the topic of multicollinearity, or the high correlation between two variables, arose in many circumstances.
Multicollinearity is a common statistical phenomenon that arises when two variables are correlated. For example. the cost of college education that someone attends will be deeply correlated to their income. If the income rises, the individual will have the possibility to attend a more expensive college.
When this circumstance arises in a regression, a high degree of multicollinearity will biase our estimates, common hypothesis testing becomes unreliable, and in some cases, the regression coefficients are more than one (which means problems).
Why do this happen ? When two variables are related to each others, the estimated value of our dependent variable (Y) will be biased given that the relationship between the two independent multicollinear variables might overestimate or underestimate the true values
How to avoid this ? Well, the most common way to do it is to evaluate the nature of the variables. If having more cars might be related to the amount of income of someone seems a reasonable assumption, it wouldn't be bad to test for it. So, the next time you think about making a regression, dont forget to account for multicollinearity between variables
-Xavi
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.