Exploratory Data Analysis(EDA) in Python!

guest_blog Last Updated : 30 Jul, 2021
6 min read

Introduction

Exploratory Data Analysis

Exploratory Data Analysis(EDA)

– Handle Missing value
– Removing duplicates
– Outlier Treatment
– Normalizing and Scaling( Numerical Variables)
– Encoding Categorical variables( Dummy Variables)
– Bivariate Analysis

Exploratory Data Analysis - Import Libraries
Box-plot after removing outliers

Box-plot after removing outliers

  1. Exploratory Data Analysis - Data Shape

  2. Exploratory Data Analysis - Data Information

    Exploratory Data Analysis - Data Type

  3. Exploratory Data Analysis - Describe

Exploratory Data Analysis - Sum

Image for postExploratory Data Analysis - Impute Missing values

Exploratory Data Analysis - Impute Missing Values

Image for post

Handling Duplicate records

Image for post

Image for post

Image for post

Handling Outlier

Image for post

Box-plot before removing outliers

Image for post

Box-plot after removing outliers

Bivariate Analysis

  1. Two Categorical Variables

    1. Bar chart
    2. Grouped bar chart
    3. Point plot

Image for post

Correlation between all the variables

Normalizing and Scaling

Image for post

Image for post

ENCODING

Image for post

Image for post

Image for post

About the Author

Ritika Singh – Data Scientist

I am a Data scientist by profession and a Blogger by passion. I have been working on machine learning projects for more than 2 years. Here you will find articles on “Machine Learning, Statistics, Deep Learning, NLP and Artificial Intelligence”.

Responses From Readers

Clear

Abdallah
Abdallah

Why did you treat postal code as a numerical variable? It is not meaningful to represent it that way, since a numerical value for postal code will be misinterpreted by any machine learning algorithm. For example, the postal code "90049" will be matched with a label based on the correlation and the postal code "300" will be matched to the other label since it has a lower value, which is incorrect. It would be better represented as a categorical variable, even if there are many unique observations.

Bala
Bala

Hi Ritika, Can you pls. help me with the csv file that you used for this tutorial? I would like to use the file to learn the steps taught here.

rohith gaddam
rohith gaddam

cool and clear its easy to understand tq for the explanation i fall in love with ur blog

Prasanna
Prasanna

found the blog on "EDA with Python' very useful . But there is a humongous distraction in the site. The floating ads(of courses offered by you) in the page are a huge distraction. Not sure how anyone from the page admin has not noticed it. The content of this blog is awesome though

Shallom Micah
Shallom Micah

Thanks alot for this. Am glad i came across this.

Saurabh Singh
Saurabh Singh

I appreciate your work. Thanks

Jogin
Jogin

Hi Ritika, Really nice blog, I liked it and wanted to learn more from you

Nitin Shelke
Nitin Shelke

Thanks Ritika for the nice explanation.

Monwa
Monwa

I am interested to know the basics on how to analyze data, get rid of duplicates and missing values.

Abhishek Parida
Abhishek Parida

Where can I find the dataset to follow and practice the code? Thank you in advance.

We use cookies essential for this site to function well. Please click to help us improve its usefulness with additional cookies. Learn about our use of cookies in our Privacy Policy & Cookies Policy.

Show details