Multicollinearity-Econometrics
Multicollinearity and it's detection
Multicollineary is a state of very high
intercorrelations or inter-associations among the independent variables. It is
therefore a type of disturbance in the data, and if present in the data the statistical
inferences made about the data may not be reliable.
Multicollinearity occurs when two or more independent variables are highly
correlated with one another in a regression model.
Y=
c+X1+X2+X3+X4+X5+X6
Crop production= c+
amount of water + rainfall + pesticides + soil quality
Crop production= c+
rainfall + pesticides + soil quality
There are certain reasons
why multicollinearity occurs:
It is caused
by an inaccurate use of dummy variables Multicollinearity-
- .
- It is caused by the inclusion of a
variable which is computed from other variables in the data set.
- Multicollinearity can also result from
the repetition of the same kind of variable.
- Generally occurs when the variables are
highly correlated to each other.
When to fix Multicollinearity
The good that it is not always mandatory to fix the
multicollinearity. It all depends on the primary goal of the regression model.
The degree of multicollinearity
greatly impacts the p-values and coefficients but not predictions and
goodness-of-fit test. If your goal is to perform the predictions and not
necessary to understand the significance of the independent variable, it is not
a mandate to fix the multicollinearity issue.
a) Find out the correlation between indepednet variable
b) If the correlation between any two independent
variable is more than 90% or 90% then drop one variable from the regression
equation.
Detection of Multicollinearity
the researcher might get a mix of significant and
insignificant results that show the presence of multicollinearity. Suppose the
researcher, after dividing the sample into two parts, finds that the
coefficients of the sample differ drastically. This indicates the presence of
multicollinearity. This means that the coefficients are unstable due to the
presence of multicollinearity. Suppose the researcher observes drastic change
in the model by simply adding or dropping some variable. This also
indicates that multicollinearity is present in the data.
Multicollinearity can also be detected with the help of
tolerance and its reciprocal, called variance inflation factor (VIF). If the
value of tolerance is less than 0.2 or 0.1 and, simultaneously, the value of
VIF 10 and above, then the multicollinearity is problematic.
Variation Inflation Factor (VIF)
A correlation plot can be used to identify the
correlation or bivariate relationship between two independent variables whereas
VIF is used to identify the correlation of one independent variable with a
group of other variables. Hence, it is preferred to use VIF for better
understanding.
VIF = 1 → No correlation
VIF = 1 to 5 → Moderate correlation
VIF >10 → High correlation
Following is the example of detecting Multicollinearity


0 comments