Home » 5 Questions which can teach you Multiple Regression (with R and Python)

# 5 Questions which can teach you Multiple Regression (with R and Python)

• Aiswarya says:

Well written article. But i couldn’t understand the concept of multicollinearity. Do you mean to say that if two variables are highly correlated, instead of taking both the variables into consideration, take only one of them? If we are to drop one of the variables, how do we choose which to drop? And what happens if we take both of them??

• Sunil Ray says:

Hi Aiswarya,

Thanks!

Multicollinearity causes when two or more predictors are correlated. We should fix it because it can increase the variance of the coefficient estimates and make the estimates very sensitive to minor changes in the model. The result is that the coefficient estimates are unstable and difficult to interpret.

There are various methods to deal with Multicollinearity:
We can fix it using various methods, let’s look at some of the methods:
• Drop one of the collinear variable based on statistical significance to explain target variable.
• You can also remove multicollinearity based on VIF (Statistical metrics). Remove only one variable with the highest VIF value and it should be greater than 5, because if it’s less than 5 then your model is not suffering from problem of multi collinearity. After removal of the variable with highest VIF run your model again and check if VIF for all the variables is under 5 if not repeat the above process.
• Other solution will be to use Principle Component Analysis, which will automatically convert multi collinear variable into one single variable.

Hope this help!

Regards,
Sunil

• Shashi says:

• Sunil Ray says:

Thanks Shashi!

• hemanth varma says:

Perfectly explained and some of my assumptions and hurdles where clarified with this beautifully tailored article 🙂

Thank you Sunil.

If Linear Regression summary is interpreted that would be very much helpful to people like me who just got started into data analytics 🙂

• Sunil Ray says:

Thanks Hemanth!

Feedback taken, will discuss this in future post!

• Sagar says:

In formula of R^2, shouldn`t it be like this ( subtracting your formula from 1 ) =>

r^2= 1 – (sum(actual – predicted)^2/sum(actual – mean)^2)

Please correct me if I am wrong.

• Sunil Ray says:

Thanks Sagar for highlighting it!

• Ramdas says:

Excellent article, i have a quick question: How is Ymean calculated in the calculation of R2. Is it average of just the actual values of y?

• Sunil Ray says:

Ramdas,

Yes, it is average of actual values of y.

Regards,
Sunil

• Trinadh says:

Hi Sunil,

You done a great job in breaking down the steps for building the regression. Very helpful article and thanks for your efforts.

• Deeksith says:

Really liked this article. I have been following this website for a while, It would really help if there is a series of posts that can help students ramp up on various topics. I am a current student in analytics and would love to see something like that. Appreciate your efforts !

• Sunil Ray says:

Deeksith,

Thanks!

We do have the road map for various topics(Python, SAS, R, Weka, Machine Learning, Qlikview and Tableau). You can refer below link for same!

Regards,
Sunil

• Chandrashekhar says:

• Ankita Singh says:

Perfectly Explained!

• Vinitha Liyanage says:

Your explanation makes easy to understand how each variable contribute to the R2.
Thank you very much.

• Nipun says:

Hi Sunil,

where can I find the train and test datasets?

Thanks,
Nipun

• Nipun says:

Hi Sunil,

I am a lil confused. In the above article you mentioned that if VIF is less than two, then the model doesn’t suffer from multicollinearity, however in your first comment you also mentioned that the VIF should be less than 5.

So, if I have a VIF value in between 2 and 5, then dies my model suffer from multicollinearity?

Thanks,
Nipun

• akash9129 says:

HI Sunil,

I have a doubt regarding the term ‘Actual’. The term actual values actually refer to the values that we receive in real life? For example, say sales data, and the model predicts an amount but after a few days it turns out to be slightly lesser or higher, so this lesser or higher is the actual data, if i am not wrong? This is for clarity purpose

Thanks

• Aditya Gupta says:

Thanks a lot Sunil

• Srinivas says:

Hi Sunil,

Thanks for very good article on regression.

• Parakram says:

Properly structured and to the point explanation of the topic, Thanks

• Ramakant sharma says:

please share how to find the right coefficients for minimum sum of squared errors:

1. OLS

if possible.

BTW it’s a great article.

• Jack Ma says:

is this analysis works for logistic regression?

• jack says:

Could someone tell me why “One disadvantage of R-squared is that it can only increase as predictors are added to the regression model ?”
thank you

• Raghava reddy says: