Exploratory Data Analysis(EDA) in Python!

Guest Blog Last Updated : 30 Jul, 2021
6 min read

Introduction

Exploratory Data Analysis

Exploratory Data Analysis(EDA)

– Handle Missing value
– Removing duplicates
– Outlier Treatment
– Normalizing and Scaling( Numerical Variables)
– Encoding Categorical variables( Dummy Variables)
– Bivariate Analysis

Exploratory Data Analysis - Import Libraries
Box-plot after removing outliers

Box-plot after removing outliers

  1. Exploratory Data Analysis - Data Shape

  2. Exploratory Data Analysis - Data Information

    Exploratory Data Analysis - Data Type

  3. Exploratory Data Analysis - Describe

Exploratory Data Analysis - Sum

Image for postExploratory Data Analysis - Impute Missing values

Exploratory Data Analysis - Impute Missing Values

Image for post

Handling Duplicate records

Image for post

Image for post

Image for post

Handling Outlier

Image for post

Box-plot before removing outliers

Image for post

Box-plot after removing outliers

Bivariate Analysis

  1. Two Categorical Variables

    1. Bar chart
    2. Grouped bar chart
    3. Point plot

Image for post

Correlation between all the variables

Normalizing and Scaling

Image for post

Image for post

ENCODING

Image for post

Image for post

Image for post

About the Author

Ritika Singh – Data Scientist

I am a Data scientist by profession and a Blogger by passion. I have been working on machine learning projects for more than 2 years. Here you will find articles on “Machine Learning, Statistics, Deep Learning, NLP and Artificial Intelligence”.

Responses From Readers

Clear

Abdallah
Abdallah

Why did you treat postal code as a numerical variable? It is not meaningful to represent it that way, since a numerical value for postal code will be misinterpreted by any machine learning algorithm. For example, the postal code "90049" will be matched with a label based on the correlation and the postal code "300" will be matched to the other label since it has a lower value, which is incorrect. It would be better represented as a categorical variable, even if there are many unique observations.

Bala
Bala

Hi Ritika, Can you pls. help me with the csv file that you used for this tutorial? I would like to use the file to learn the steps taught here.

rohith gaddam
rohith gaddam

cool and clear its easy to understand tq for the explanation i fall in love with ur blog

We use cookies essential for this site to function well. Please click to help us improve its usefulness with additional cookies. Learn about our use of cookies in our Privacy Policy & Cookies Policy.

Show details