Home » A comprehensive beginners guide for Linear, Ridge and Lasso Regression in Python and R

# A comprehensive beginners guide for Linear, Ridge and Lasso Regression in Python and R

• Vasim says:

The way you explained it, mind blowing!!!
Just hope I can reach your level 🙂

• Shubham Jain says:

Thanks for the comment.

• pham Thor says:

• Amita says:

Very well explained Shubham. It was a wonderful read.

• Shubham Jain says:

Thank you.

Thank you very much, Shubham. You could explain many subjects in just one article and so well. Well done.

I just like to ask you if you mistakenly switched the place of independent and dependent in this paragraph, or I am confused about X and Y here:

“R-Square: It determines how much of the total variation in Y (independent variable) is explained by the variation in X (dependent variable).”

• Shubham Jain says:

Thanks for pointing out, it was a mistake from my side.

• vaibhav says:

Hey. Is there a version of this blog with the code in R ?

• I liked the way you aproaches the contents. I will take a time to absorb the most of issues demonstrated, the theoretical aspects are a challenge for me at this moment, it is too much advanced for my basic statisticals knowledge

• Pawel says:

Can you tell what exactly happens when you # creating dummy variables to convert categorical into numeric values

I try to translate your code to R, and I struggle a little bit there. Not sure what is the process, how dummy data look, and what are the final features you used.

• Shubham Jain says:

Dummy encoding is used for categorical variables to convert each category into a separate column containing only 0 and 1, where 1 indicated its presence and 0 indicates its absence.
For example, gender will contain two category, Male and female. So by dummy encoding them, you will create two separate variables, Var_M with values 1(Male) and 0 (no male) and Var_F with values 1 (female) and 0 (no female).
You can use the “dummies” package for R for this.
My final features includes all continuous variables and dummy variables for all categorical variables (make sure you drop the original column after encoding them), excluding Item_Identifier and Item_Outlet_Sales.

Hi Shubham,
Thanks for the nice post. For the dummy variable, if Var_M and Var_F have values 0 and 1, wouldn’t it be considered a categorical variable? I have a dataset if fields such as HP (0 or 1)-> 1 is considered a high performer and several other fields which are continuous. Hence, I wanted to know if I need to do any translation when using logistic regression. Please advise.

• Bhuvana says:

Hi, I am new to data science. I found the article quite interesting (theoretically). But when I try to implement things practically I have issues. May I know how was the mse ( mse = 28,75,386) calculated based on location?

• Shubham Jain says:

I have basically calculated average sales for each location type and predicted the calculated values on the data based on their location type. Then mse was calculated using the formula given as: mse = np.mean((predicted_value – actual_value)**2)

• PUNYASMARAN PANDA says:

Many complex concepts have been explained so nicely. Thank you very much for the article

• Shubham Jain says:

Thank you.

• Raghavendra says:

Really very deep understanding article.
Great work keep it up
Surprising thing you gained this much knowledge while studying it self. This showing so much of your passion .

• Shubham Jain says:

Thanks for the comment.

• Niranjan says:

Thanks a lot Shubham for such a well explained article.

This is one of the best article on linear regression I have come across which explains all possible concepts step by step like all dots connected together with simple explanation.

I would really appreciate if you do same of kind of article on Logistic Regression. I look forward to it.

Thanks again Shubham.

• Pranjal Srivastava says:

Nice work Shubham. Way to go man!

• Shubham Jain says:

Thanks Pranjal.

• Vivek says:

Thank you so much team for nice explanation!

• Shubham Jain says:

Thanks Vivek.

Great article Shubham!

• Shubham Jain says:

• The article is just superb. My only curiosity about you…your interest into ML even though you are into ceramics engineering 🙂
But again, the article is superb….i am reading it slowly with implementing each type of regression. I was working on the same data set prior to stumbling on your article.
Helped a lot…thanks and cheers 🙂

• Shubham Jain says:

Thanks abhishek. Glad you like the article. 🙂

• Pallav says:

Damn, that was an awesome read. Just perfect. Can you also do an article on how to do data analysis on terabytes of data? Like which server to buy, how to set it up, Apache spark, etc. I am eagerly waiting for that.

• In STEP 7 Underneath building the model, you import a new data set with a different name (training instead of Train) is there another separate dataset?

• Shubham Jain says:

No, it is the same dataset.

Thanks for the brilliant article Shubham! Gave me a holistic view of Linear Regression. Also, I have followed the concepts in the article and tried them at the Big Mart Problem. The code is documented here https://github.com/mohdsanadzakirizvi/Machine-Learning-Competitions/blob/master/bigmart/bigmart.md
🙂

• Shubham Jain says:

Thanks Shubham,,, A very clean and neat explanation to beginners . I learnt about the regressions very well.. your way of presentation is awesome. Keep it up.. help us in understanding many such topics.

• Very insightful article and nice explanation.
However while trying to include all the features in the linear regression model (Section 7), R-sq increased only marginally to around 0.342…I have used the same code. Can you please help me figure out why I am getting this discrepancy?

• Naveen Pallem says:

This is one of the article which I would suggest to go through for any data scientist aspirant. Very well described on linear regression techniques.

• bhawna anand says:

nice article, you have exlplained the concepts in simplistic way.Thanks for the efforts.

• krishn says:

Crystal Clear ,Must read ,Thanks Shubham

Thank you Shubham for the clear explanation and you have covered too much content in this article. Can you please try to give us the same on logistic regression, linear discriminant analysis, classification and regression tree, Random forest,svm etc.

• Evangelos Katsaros says:

By far the best regression explanation so far. Never have I seen a textbook to explain why regression error is preferable to be considered as the sum of square of residuals and not the sum of absolute value of residuals.
Thanks Shubham!

• yandi says:

Is there any solved data set with multiple linear regression with issues of multi-col-linearity, heteroskedasticity, auto-correlation of error, over-fitting with their remedial measures?

• Michael says:

Very good article!
Could you just explain how to plot the figures where you show the values of the coefficients for Ridge and Lasso?

Thank you very much
Best Regards,
Michael

• satye VENKATESH says:

Could you please clarify on hetroskadacity in linear regression? Are non-linearity and hetroskadacity the same?In this article, they are treated as same,but in “Going Deeper into Regression Analysis with Assumptions, Plots & Solutions” they are termed as different.

• Vivasvan Patel says:

Very appropriatle explained in consize and ideal manner! Thank You Sir

• viraat_maun says:

Hi Shubham,
It is a good informative article!
Can you please share the examples of python code for Polynomial Regression, adjusted-R square, Forward selection & Backward elimination?

• Robert Feyerharm says:

Great introduction to the topic of shrinkage!

Knowing there wasn’t space to cover all the variants, one form of shrinkage that all data scientists should be aware of are random effects. More frequently used by statisticians in explanatory models, random effects have an application in predictive models in cases where data are clustered into multiple groups in which the response variables are correlated, and can be used in combination with other forms of penalization such as the lasso and ridge.

For example, let’s say you have to predict the future medical cost of the next insurance claim per member given a dataset containing 10 million past claim records for 1 million members and 10 claims per member, We’ll assume the 10 claim amounts per member are approximately normally distributed. Rather than including 1 million categorical variables to account for member-level effects, a better predictive model would include a single random effect. Furthermore, if the members themselves are clustered into other categories, such as hospital, another level of random effects can be introduced in a hierarchical model.

• mustafa.k786 says:

Beautiful explanation, quite flawless !! Easy to understand. I wish to have a teacher like you throughout my journey to be a ‘true’ data scientist.

• jatinpal singh says:

Exceptional article, really.Looking forward to your more articles Shubham

• jatinpal singh says:

Exceptional article

A perfect article on regression which most of the books failed to explain it.
I read this topic on couple of neural network books but it was very untidily portrayed.

The X-factor of this article was the Big mart example you choosed.
Clearing all my doubts with ease. Just great!
Thank you.
And would love to read more articles and such awesome explanations on ML.

• Yaamini says:

Very well explained!

• Garg says:

Very well explained. Superb. If you are ok can you tell us what is your source of information. I would really like to follow it.

Finally understood how regularization works!
Thanks a lot!

• Dina says:

Great article, thanks so much!

• Aishwarya Singh says:

Hi Dina,

• Roshni Roy says:

Extremely informative write-up!! The figures are so self explanatory too! Thank you!

• Aishwarya Singh says:

Hi Roshni

• Chirag Pandey says:

This was really good. You covered everything and its really helpful.

• Aishwarya Singh says:

Hi Chirag,

Thank you for the feedback.Glad you found this useful!

• mahima says:

you really did a great job thanks . Can you also do an article on dimension reduction?

• AB says:

Hi Subham,

I guess the statement “How many factors were you able to think of? If it is less than 15, think again! A data scientist working on this problem would possibly think of hundreds of such factor.” It is rude to me and I am sorry to say because I felt its offensive to me as you cannot just say every data scientist could possibly think like how you think

• Aishwarya Singh says:

Hi AB,

Feature engineering is a very difficult process to grasp. I am sure the author only wanted to convey that it is important to create more than 15-20 features, rather than discourage the readers.

I agree that this could have been conveyed in a better way. We will update the article accordingly.