10 Powerful and Time-Saving Data Exploration Hacks, Tips and Tricks!

Ram Dewani Last Updated : 14 Apr, 2020

5 min read

Introduction

“ Give me six hours to chop down a tree and I will spend the first four sharpening the axe.” – Abraham Lincoln

What does this quote by the great Abraham Lincoln have to do with data exploration? Think about it – this quote stands true in most cases in real life, even in our field of data science.

We can’t build a machine learning model and hope to host a successful data science project without properly understanding and exploring the data. No matter how many fancy algorithms we deploy or how much computation we use, it’ll give spurious results until we do the most important activity – Data Exploration.

Data exploration helps us to understand our data, its structure, strengths, and weaknesses.

We have a countless number of techniques and tools available to perform data exploration. Here’s the kicker though – I’ve seen a lot of data science professionals skipping or skimming through the exploration stage. This is akin to going on a camping trip, having a world-class swiss knife, but only using it to cut fruits and vegetables with it, That’s missing the entire point!

So in this article, I have put together 10 powerful data exploration hacks, tips, and tricks to help you save time and quickly analyze the data at hand.

This is part 2 of my Data Science hacks, tips, and tricks series. I highly recommend reading the first part here.

I have also converted my learning into a free course that you can check out:

Data Science Hacks, Tips, and Tricks!

Also, if you have your own Data Science hacks, tips, and tricks, you can share it with the open community on this GitHub repository: Data Science hacks, tips and tricks on GitHub.

We are posting these hacks daily on social media platforms like LinkedIn, Twitter, Facebook. Make sure to follow #avhackoftheday to get your daily dose of freshly brewed data science hacks, tips, and tricks!

We’ll cover these data manipulation and data wrangling hacks, tips and tricks :

Data Exploration Hack #1 – Pandas Profiling
Data Exploration Hack#2 – Building Time Based Features
Data Exploration Hack#3 – Heatmap over a Pandas DataFrame
Data Exploration Hack#4 – Imputing missing values using KNNImputer
Data Exploration Hack#5 – Plotting a Decision Tree
Data Exploration Hack #6 – Binning Data
Data Exploration Hack #7 – Funnel Charts
Data Exploration Hack #8 – Pandas Crosstab
Data Exploration Hack #9- Interactive plots
Data Exploration Hack #10 – Bar Plot over Pandas DataFrame

Data Exploration Hack #1 – Pandas Profiling

The Pandas library has won the hearts of the majority of data scientists out there. Pandas Profiling provides you with an instant overall report of your data. It provides you with visualization of features, percentages of missing values, an indication of multicollinearity and much more.

It’s truly a handy tool for everyone!

Code for Pandas profiling

Data Exploration Hack #2 – Building Time Based Features

A lot of the data we collect these days contains date and time variables. There is a lot of information such as – year, month, quarter, day of the week, hour, etc. that you can extract from these features and utilize it in your analysis. These features will enhance your analysis as well as your predictive model.

Code for Building Time Based Features

Data Exploration Hack #3 – Heatmap over a Pandas DataFrame

Another hack that’s going to impress your colleagues is plotting a heatmap over a Pandas dataframe. This helps you evaluate your results in just one glance and also provides you with a clean and elegant visualization that you can show to your manager and become a rockstar! We use Seaborn to accomplish this task.

Code for plotting a heatmap over a Pandas dataframe

Data Exploration Hack #4 – Imputing missing values using KNNImputer

KNNImputer is another great function added to the latest edition of Sklearn – 0.22. Usually, we tend to impute the missing values using univariate methods such as SimpleImputer.

Instead, we can use multivariate methods such as KNNImputer to complete this task. The KNNImputer imputes missing values using k-Nearest Neighbors. The missing values are imputed using the mean value from the nearest neighbors found in the training set.

Code for imputing missing values using KNNImputer

Data Exploration Hack #5 – Plotting a Decision Tree

Honestly, this is one of the best updates in sklearn. Decision trees are one of the most intuitive algorithms to find the effects of independent variables. Using this function, you can easily plot a decision tree in just one line of code.

Go ahead and play around with the hyperparameters to get the optimum result!

Code for plotting a decision tree

Data Exploration Hack #6 – Binning Data

Binning can be really important in your data exploration activity. We typically use it to transform continuous variables into discrete ones.

Let’s take a look at an example from the Titanic dataset where we convert continuous variable ‘Age’ into a discrete variable ‘AgeGroup’. In this case, it’ll be more sensible to include AgeGroup as it’ll provide more insightful results. Let’s checkout the example in this video:

https://youtu.be/WQagYXIFjns

Code for binning continuous features

Data Exploration Hack #7 – Funnel Charts

As a product growth analyst, I am always curious about the journey of users through different stages. The Plotly library provides a great tool to visualize and understand the user journey through the funnel chart.

These charts also provide a way to understand the inconsistencies in the way of the user journey. The interactive funnel shows the number and percentage decline at every stage.

Code for Funnel Charts

Data Exploration Hack #8 – Pandas Crosstab

Pandas Crosstab can be really beneficial to validate some basic hypotheses and form a more intuitive view of the data. It computes a simple cross-tabulation of two (or more) factors. By default, it’ll compute a frequency table if not aggregation function is provided.

Let’s deep dive into its code!

Code for Pandas crosstab

Data Exploration Hack #9- Interactive plots

Plots are a great way to visualize your data but what if I tell you that there’s an even better way to do it – using Interactive Plots!

The Cufflinks function binds plotly directly to Pandas dataframes. Therefore, you can make interactive charts without any hassle or long codes. You can hover over different plots and data points to see the exact numbers. This tip will definitely make you shine in front of your teammates!

Code for creating interactive plots

Data Exploration Hack #10 – Bar Plot over Pandas DataFrame

A lot of people argue that Excel has far more options than Pandas in terms of exploring your data. Well, Pandas has some cool options too! You can plot bar charts over a Pandas dataframe which will help you understand and explore the data much more effectively.

You can explore a lot of options by tweaking the parameters of df.style.bar():

Code for styling bar chart over Pandas dataframe

End Notes

In this article, we covered 10 data exploration hacks, tips, and tricks across various tools and techniques to become a better and efficient data scientist. I hope these hacks will help you with day-to-day niche tasks and save you a lot of time.

Let me know your Data Science hacks, tips and tricks in the comments section below!

Ram Dewani

Product Growth Analyst at Analytics Vidhya. I'm always curious to deep dive into data, process it, polish it so as to create value. My interest lies in the field of marketing analytics.

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.6

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

Sachchidanand Kumar

Hello Ram, Really great Insight on data analysis part ,I tried all the codes and hacks given, but i would like to know is there there any edge in using python for creating visualization tools as , there are already good visualizing tools like Power BI , Tableau , Qlik are available in the market. My second doubt was , while applying Pandas_Profiling I was getting an error message and some codes (especially the interactive graphs) did not appear . I am using Google Colab for executing these codes.

Himanshu goyal

Amazing,really i was horrified when there are so many features and you are performing heatmap to know know correlation.i am color blind too. and suddenly followed your post,i like pandas profiling too.but style method you have explained has made my life easier. Thanks for great article.

Reading list

10 Powerful and Time-Saving Data Exploration Hacks, Tips and Tricks!

Introduction

Table of Contents

Data Exploration Hack #1 – Pandas Profiling

Data Exploration Hack #2 – Building Time Based Features

Data Exploration Hack #3 – Heatmap over a Pandas DataFrame

Data Exploration Hack #4 – Imputing missing values using KNNImputer

Data Exploration Hack #5 – Plotting a Decision Tree

Data Exploration Hack #6 – Binning Data

Data Exploration Hack #7 – Funnel Charts

Data Exploration Hack #8 – Pandas Crosstab

Data Exploration Hack #9- Interactive plots

Data Exploration Hack #10 – Bar Plot over Pandas DataFrame

End Notes

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Become an Author

Flagship Programs

Free Courses

Popular Categories

Generative AI Tools and Techniques

Popular GenAI Models

AI Development Frameworks

Data Science Tools and Techniques

Reading list

Intoduction to Python

Variables and data types

OOPs Concepts

Conditional statement

Looping Constructs

Data Structures

String Manipulation

Functions

Modules, Packages and Standard Libraries

Python Libraries for Data Science

Reading Data Files in Python

Preprocessing, Subsetting and Modifying Pandas Dataframes

Sorting and Aggregating Data in Pandas

Visualizing Patterns and Trends in Data

Programming

10 Powerful and Time-Saving Data Exploration Hacks, Tips and Tricks!

Introduction

Table of Contents

Data Exploration Hack #1 – Pandas Profiling

Data Exploration Hack #2 – Building Time Based Features

Data Exploration Hack #3 – Heatmap over a Pandas DataFrame

Data Exploration Hack #4 – Imputing missing values using KNNImputer

Data Exploration Hack #5 – Plotting a Decision Tree

Data Exploration Hack #6 – Binning Data

Data Exploration Hack #7 – Funnel Charts

Data Exploration Hack #8 – Pandas Crosstab

Data Exploration Hack #9- Interactive plots

Data Exploration Hack #10 – Bar Plot over Pandas DataFrame

End Notes

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Become an Author

Flagship Programs

Free Courses

Popular Categories

Generative AI Tools and Techniques

Popular GenAI Models

AI Development Frameworks

Data Science Tools and Techniques