The Understated Art of Data Storytelling

Varsha Last Updated : 30 Jul, 2023

7 min read

This article was published as a part of the Data Science Blogathon.

Introduction to Data Storytelling

Storytelling is a beautiful legacy that is a part of our great Indian culture, from the legendary Mahabharata era to Puranas and Jataka fables. Stories have been an imminent part of childhood, and the most important morals and values were bestowed on us through the art of storytelling. Believe it or not, the power of a story can tell you a lot in little.

Leo Widrich, an entrepreneur, in his paper “The Science of Storytelling: What Listening does to our Brain”, explains that listening to something activates our brain and instils a sense of belonging with the story’s plot. Let’s say the storyline says, “Sam pushed himself again at the gym”, which activates the sensory cortex of the listener, leading to a more enriching experience. Chemicals like cortisol, dopamine and oxytocin are considered to be released when listening to a story. Cortisol helps in memory retention of a particular plot of a story, while dopamine is a controller of emotional responses that helps engage for a long time. The release of oxytocin is imminent in maintaining deeper, rich and more effective connections.

That said, this understated art of storytelling is relevant to customer retention for businesses. Combined with the power of Data, the art of storytelling can revolutionise the way business reports and dashboards are read. From Marketing to Sales, healthcare & life sciences solutions to disrupting the travel and tourism industry, data stories are bridging the gaps in moving from old-school plain PowerPoint presentations to dynamic visualisation dashboards. But, what stands for a good, succinct data story, needs to involve some important components.

The three components – Data, Narrative and Visuals form the basis of data storytelling.

Data Storytelling

Data Cleaning

This first step can be a little boring yet relevant simultaneously. Most of the time, the data we get to work on is littered with unwanted, unnecessary data that, if ignored, can lead to unsatisfactory results. This can lead to bad marketing & sales and, eventually, low turnaround. This is what makes data cleaning a crucial step. That being said, the following are the 6 important steps in data cleansing.

According to some internet reports, around 2 – 3 trillion pieces of data are generated daily. Sounds too much, right? This data is, however, unstructured and not fit for its first use. To get this data working, we need to make it go through some cleaning.

1. Checking on Null/NaN Values:

Null values, that is, values that reflect “NaN/null is considered null values. These values have no relevance to the significance of the data and the value it is about to convey. Removing them is necessary as it can help maintain the quality of the data. We may either replace these null/NaN values or, depending on the usability of the data, remove them.

2. Removing Unnecessary Data:

The possibly two types of unnecessary data are – duplicates and redundant data. As it goes with the duplicates, it increases the redundancy of the overall dataset. As far as redundant data is concerned, it requires us to have a thorough business understanding and the data value the key stakeholders are looking for.

3. Working on the Data Structure:

This involves working on grammatical errors, spelling errors, and wrong use of words/conventions. This might sound like simple English, but imagine its impact on the humongous data that is majorly being processed in English across the world. This step involves renaming the columns according to conventions, changing the datatypes of the data and adding or deleting unnecessary fields and attributes. It is a mandatory step as keeping the language of the data intact helps design and train the ML model in a better and more effective way.

Data Storytelling

4. Filtering Out Missing Data:

Unlike null values, this is a step where we target the fields with no values, simply blank. To connect the dots with the available data, we must filter out the missing data and prepare it for preprocessing.

5. Filtering Out Outliers:

Outliers are values far from the convention that the rest of the data has set up. These values can skew the data without any direction or purpose and hamper the analysis. One of the most effective ways to look out for the outliers in the dataset is by performing EDA, which is Exploratory Data Analysis. However, this does not mean that the outlier data should never be considered. It depends on the analysis and model we run and its training requirements.

6. Validating the data:

This is the final step of the data cleansing process, and here is where we now conclude on our data for its uniformity, quality and integrity. This phase mostly involves answering questions like:

Is the cleansed data enough for making an effective analysis or training a machine learning model?
Is the data uniform in its structure?
Is there still any redundant data in the dataset?

This may seem an enduring process, but it is worth every second you spend on it for any organisation and individual satisfaction. Once this big chunk of data passes through these 6 essential steps, it is ready for some good analysis and beautiful visual presentations that make the data talk!

Data Visualization

Let’s get the data talking with some of the most prominent data visualisation tools used by the industry to create some inspiring and appealing data stories. This is the most important step in the entire data storytelling journey. While Python has some efficient libraries like Matplotlib and seaborn for visualising the data, they lack the ‘dashboard experience’ that gives an edge on business decision-making.

Some of the famous and in-demand tools that fill this gap for a smooth, dynamic and real-time dashboard experience and for that edge-on, cut-on-cut analysis include, but are not limited to, Tableau (powered by Salesforce), Power BI (powered by Microsoft), Data Studio (Google), Qwiklabs, Looker, etc. While Tableau & Power BI are the most loved Data Visualization applications, others on the list also get through well.

However, even before getting to the canvas to design the dashboard, certain points need to be taken care of.

Key Steps in Data Visualization

1. Review your data once again:

No matter how excellent your data cleaning and modelling skills are, reviewing the data once again is always a good idea. The points that need to be considered at this stage are the relevancy and redundancy of the available data. It is always a good idea to modify the otherwise technical, unreadable attributes with more readable and easy-to-understand words to make the dashboard or reports more comprehensible.

2. Make your data go through the W-W-H method:

It is extremely critical to formulate precise and articulate questions to answer the burning questions that the data is supposed to be answering and derive insights from it. One of the simpler methods is the W-W-H – What-Why-How method. This method can be excellent for pre-production as well as post-production business analysis.

Take the instance of an Xyz pharmaceutical company on research for making a new drug. In this case, the first subject of the research should understand the current market, customers’ behaviour towards various pharma brands and available drugs and lastly, the current healthcare situation in its geography/region of interest. This is the ‘WHAT’, the first component of the method.

After gaining a thorough understanding of the above points, comes the stage to ask, ‘Why do we want to build this product?’ The second step of this method, the ‘WHY’ starts with this question, why do we want to build or design something? Clear objectives can help drill down the analysis on the dashboard and reports.

Finally, the ‘HOW’- is the most crucial and most interesting part of the method. After the above research and analysis, we can go ahead with deriving some of the most powerful insights from our data. The ‘HOW’ helps gain a detailed perspective on

1. The potential of the product in terms of its geographical range

2. The ever-changing dynamics of the retail, digital & e-commerce

3. Performance comparison of the current market competitors and so on.

3. Make sure to enable row-level security:

To maintain the integrity and security of the data, it is always a good idea to define roles and rules at the dashboard level. After defining the roles, one should always validate or test the rules for different user roles you define for various user groups at the organisational level. This can be easily achieved through Data viz tools like Power BI, Looker and others.

Conclusion to Data Storytelling

By this time, we learned much about storytelling’s ancient and natural art. In the second part of this blog, we related this art with technology to learn about Data Storytelling. Along with the basics, we also understood why it is important to learn this understated soft skill that is extremely necessary for anyone to have a successful career in Data Science. In the third part of this blog, we learnt in detail about the components of Data Storytelling, from data cleaning to visualisation.

By now, we must have understood that the skill of Data storytelling is critical and significant not just for the Data/AIML Scientists or Analysts but for every professional from all the sectors we can think of. Data, being the oil of Industrial Revolution 4.0, must be utilised robustly. This requires every IT/Non-IT, private or public sector professional and even entrepreneur to learn and develop their data storytelling skills to make sense of trillions of data received from multiple sources. This is one skill that can enhance the value of the technical knowledge one may possess. Moreover, as we learnt above, data storytelling is also helpful in aiding us to learn the ‘Art of Questioning the Data’, where the entire analysis may start.

To continue our learning, the next part of this blog will teach you how to build a powerful post-Sales Analysis Dashboard and build data narratives around it. And always remember. The journey of learning something new is always challenging but worth it!

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

Varsha

Beginner Data Cleaning Data Visualization

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

Reading list

Basics of Machine Learning

Machine Learning Lifecycle

Importance of Stats and EDA

Understanding Data

Probability

Exploring Continuous Variable

Exploring Categorical Variables

Missing Values and Outliers

Central Limit theorem

Bivariate Analysis Introduction

Continuous - Continuous Variables

Continuous Categorical

Categorical Categorical

Multivariate Analysis

Different tasks in Machine Learning

Build Your First Predictive Model

Evaluation Metrics

Preprocessing Data

Linear Models

KNN

Selecting the Right Model

Feature Selection Techniques

Decision Tree

Feature Engineering

Naive Bayes

Multiclass and Multilabel

Basics of Ensemble Techniques

Advance Ensemble Techniques

Hyperparameter Tuning

Support Vector Machine

Advance Dimensionality Reduction

Unsupervised Machine Learning Methods

Recommendation Engines

Improving ML models

Working with Large Datasets

Interpretability of Machine Learning Models

Interpretability of Machine Learning Models

Automated Machine Learning

Model Deployment

Deploying ML Models

Embedded Devices

The Understated Art of Data Storytelling

Introduction to Data Storytelling

Data Cleaning

Data Visualization

Key Steps in Data Visualization

Conclusion to Data Storytelling

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Congratulations, You Did It!

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

_ga_#

_gat_#

collect

AEC

G_ENABLED_IDPS