Data Storytelling: How to Tell a Great Story with Data?
In a world flooded with data, the ability to transform raw numbers into meaningful stories has become a sought-after skill. Data storytelling is more than just presenting facts; it’s an art that breathes life into information and sparks curiosity. Let’s explore the enchanting realm of data storytelling and learn how to captivate audiences with the persuasive magic of data-driven narratives.
Table of contents
- What is Data Storytelling?
- Why Do We Need Data Storytelling?
- Benefits of Data Storytelling
- How to Create Data Stories?
- Types of Data and Suitable Charts
- Data Storytelling During Predictive Modeling?
- Best Practices for Data Storytelling
- Frequently Asked Questions
What is Data Storytelling?
Data storytelling is a method of communicating insights and information derived from data through the use of compelling narratives, visuals, and data-driven evidence. It involves presenting data in a way that makes it easier for people to understand, engage with, and draw meaningful conclusions from the information presented. By weaving data into a cohesive and persuasive story, data storytelling enables organizations and individuals to make informed decisions, influence stakeholders, and create impactful presentations.
Why Do We Need Data Storytelling?
The art of storytelling is simple and complex at the same time. Stories provoke thought and bring out insights that could not have been understood or explained before. It’s often overlooked in data-driven operations as we believe it’s a trivial task.
What we fail to understand is that the best stories not presented well end up being useless!
In several firms, the first step towards analyzing anything is story-boarding. Questions like why do we have to analyze it? what decisions can we make out of it? Sometimes, data alone tells such visual and intricate stories that we don’t need to run complex correlations to confirm it.
The best example of needing stories and visuals to explain data is the Anscombe’s Quartet. The Anscombe’s Quartet is a set of four datasets with very similar statistical summaries, but completely different when you visualize them.
These are the four datasets used during the depiction of the Anscombe’s Quartet. If we look at mere numbers, we find that their summary statistics are almost identical.
Let’s see how they appear when we visualize them.
Did you ever think these four quartets would have such varying visuals?
Benefits of Data Storytelling
- Data storytelling makes complex data more accessible and understandable, allowing audiences to grasp insights easily.
- Engaging narratives and compelling visuals keep audiences hooked, increasing retention and attention.
- Storytelling with data empowers better decision-making by presenting evidence-based insights.
- Stories are memorable; data combined with storytelling creates a lasting impact on the audience.
- Convincing narratives backed by data build trust and influence stakeholders effectively.
- Data storytelling adds a human touch, making data relatable and emotionally compelling.
- Data-driven stories create empathy, helping organizations connect with their customers’ needs and experiences.
- Storytelling reveals trends and patterns hidden within data, leading to valuable discoveries.
- Data stories help identify and address challenges, promoting innovative problem-solving.
- Storytelling unifies teams around data, enabling better collaboration and shared vision.
- Data storytelling is a potent tool for advocating social causes and driving positive change.
- Organizations that excel in data storytelling gain a competitive edge in their industries.
- Data storytelling eliminates jargon and simplifies complexities, promoting clear communication.
- Understand customer preferences and behavior better through data-driven stories.
- Engaging data stories form the basis of powerful marketing campaigns.
Note: Make sure you check out the comprehensive multi-course Certified Business Analytics Program that covers the art of storytelling through various industry examples and using tools like Excel, Python and Tableau.
How to Create Data Stories?
To create a story or a plot is the first step to selling your ideas with a strong foot forward. Most people fail to think their stories through and cannot differentiate themselves from mediocrity. Let me take an example and guide you through the steps of creating stories.
We will be exploring a dataset that has news headlines and details of every stock price from the NASDAQ 100 tech companies. The columns selected are as follows:
1. Begin With a Pen-paper Approach
Visually engaging presentations will inspire your audience, but they definitely need more work to be put in. One of the best presentations have been created on rough pages and tissue papers. Scripting down your ideas and flow before you start structuring your story is very essential to your final product. The single most important thing you can do to dramatically improve your analytics is to have a story to tell. A flow that you can generate can have a lot of friction in your end result.
Aristotle’s classic five-point plan that helps deliver strong impacts is:
- Deliver a story or statement that arouses the audience’s interest.
- Pose a problem or question that has to be solved or answered.
- Offer a solution to the problem you raised.
- Describe specific benefits for adopting the course of action set forth in your solution.
- State a call to action.
The way I structured my report was by involving plots that would give me a better understanding of my data. First idea that I had was, how can I make better business decisions of stocks by using the data that I have?
Involving a line graph would help me analyze trend lines of specific stock prices.
As I can see, February 2016 has been a drop for all stocks. This would help me scrape news articles only from that period to identify what caused the drop. Now, how do I select which news source to scrape from?
By identifying which news source reported most about a particular stock, we would have reason to believe that this is a good source for the specific stock.
2. Dig Deeper to Identify the Sole Purpose of Your Story
- Identify closely, what the idea of your story is. Ask yourself, “What am I really giving with this story?” It’s never the story alone, but what the story can do to make decision making better. What you’re displaying is the idea of a better decision making or analytics.
- Develop a personal “passion statement.” In one sentence, tell your prospects and why you are genuinely excited about working with them. Your passion statement will be remembered long.
3. Use Powerful Headings
- Create your heading, a one-sentence statement for your story, visual, or analysis. The most effective headlines are concise, specific, and offer a personal benefit.
- Remember, your heading is a statement that offers your audience a vision of a better understanding. It’s not about you. It’s about them.
4. Design a Road-Map
- Create a list of all the key points you want your audience to know about your story, visual, or analysis.
- Categorize the list until you are left with only three major message points. This group of three will provide the verbal road map for your story.
- Under each of your three key messages, add supporting evidence to enhance the narrative. These could include some or all of the following: personal stories, facts, examples, analogies etc.
5. Conclude with brevity
Now that you have put forward all points of your story, your conclusion should be short and powerful. In my report, I mentioned small 3-4 liner summaries to conclude why to buy a particular stock.
Types of Data and Suitable Charts
Let us see the common types of data we encounter and how to tell stories from those, by selecting the best-fit charts. Commonly encountered types of data:
When data is found in this form, it’s usually good to be finding how often a word has been used or what the sentiment of the text is. Stories can be told best using this form of data. One of the best-suited visualizations for textual data is the WordCloud. The wordcloud brings the more frequent ones to the center and enlarges them, giving us a clear picture of what the general idea of the text depicts.
For example, the wordcloud in this article displayed above gives a representation of the twitter dataset. It shows that love is the most frequent positive term used in the tweets.
When our data consists of numeric or any other variety of formats, we need to know which ones are important and give us better insights from our dataset.
The preferred visual for this kind of data can vary; here I will show you how to use facet grids for the data. I will be using the Titanic Passenger Data.
As this plot shows us, females and first-class passengers tend to have a higher survival chance than men who are a part of the crew or lower boarding classes.
Isn’t that what had really happened on the Titanic?
Another way to visualize this kind of data is by trying a multivariate plot. The dataset in use for this plot is the Car Performance and Specifications dataset.
Here we can see how Cars that have a heavier built are slower than the ones with lighter bodies. Makes sense, right?
When we encounter this kind of data, we’re usually looking for trends or lines that depict numbers. The visual that would suit numeric data best would be a line or a step graph.
Here, we can very clearly see the rise of prices at a local attraction for adults and children. See how easy it is to see the growth at each year interval?
One of the datasets that we also encounter are related to stocks. Stock market data is primarily a time series data of numeric values, but as a trader or an investor, I would like to understand each date and drop carefully.
The most visually captivating charts in this regard is the Candlestick chart.
Here, we take the example of Tesla’s stocks. The candlestick charts can be used to maneuver across each date and see the lows and highs of stocks individually. This could help us take better investment decisions based on current or past market trends.
As the graph shows us, February 2016 was a drop for Tesla’s stocks. We could now use this information to understand other market conditions and economic situations to make decisions about their stock.
When we have data pertaining to specific locations and areas, we use maps to add clarity and meaning to our analysis.
In this example, we can see how countries fared at and after the 2002 World Cup. Germany has scored the maximum number of goals, being one of the most dominant teams in world football ever since.
Data Storytelling During Predictive Modeling?
Often, we would be questioned about how our stories and visuals can work or help when it’s time to create mathematical models. During all stages of predictive modeling, storytelling could be a vital addition to your analysis.
Let us understand the basic steps involved in creating models out of our data and go through telling stories within them.
The first step of model building is understanding your data. I’ll give you instances and show you how you can explore your data without computing complex statistics.
Let’s consider a dataset on Wine Quality. This is the structure of the dataset is as follows
Here, we can see the associated summary statistics of the dataset in use.
So, if we need to see whether there is any correlation between alcohol volumes and wine qualities, how do we do it?
We could either compute Pearson’s ‘r’. It would help us in building a model, but would not help us in analyzing much.
This shows a very strong correlation between Alcohol content and wine quality. But does it tell you anything else?
Ideally, it doesn’t. So, what does?
Let’s see how we can visualize these and tell a lot more from them.
First, we’ll begin by seeing how Wine Quality relates to Alcohol content.
Here, we can see that the higher alcohol volumes relate to better wine qualities and it helps us come to a better understanding of our data. We can also spot outliers better in this scenario.
Next, would you wonder how acid contents in your wine affect its quality?
This would be one way to visualize the effects of acid. As the Violin Plot expands horizontally, it shows that there are higher numbers of data points within those areas.
After you generate features, how do you see how well one is predicting?
Graphs tell us how far away our predicted points are from our fitted line.
Another example where we might have to visualize newly created visuals is the Principal Component Analysis. If you want to get an in-depth understanding of PCA, you can go through this article.
This is the Iris dataset found in RStudio.
When we run the principal component analysis on this dataset, we find these statistics.
Although when we plot this, we find that the resulting visual is much more informative than the statistics.
Model Creation and Comparison
Coming to the model creation phase, we usually find the need to understand how our data is being fitted.
This is a model that predicts whether the car should go fast or slow, based on the grade of the road and bumpiness.
As you can see, the decision boundary clearly classifies most of the data but an accuracy of 88.21% doesn’t tell much of a story. Here we can even see how far the misclassified points are from the decision boundary.
We can also compare certain algorithms and techniques by looking at their decision boundaries as we did above.
Another example using the Iris dataset is shown below.
Here, there’s not much information to derive valuable insights about our model.
To learn more about Support Vector Machines, you can go through this article.
On the other hand, this plot shows us a clear classification boundary where the Species separate from each other.
Best Practices for Data Storytelling
Now that you know the scenarios where we can use story telling to explain our point, I will give you a few practical tips when you take this up on your own.
- Always label your axes and give the heading of your plot.
- Use legends where necessary.
- Use colors that are lighter on the eye and in proportion.
- Avoid adding unnecessary detail to your visualization like backgrounds or themes that don’t allow good readability.
- Only a point can be used to simultaneously encode two quantitative values based on a horizontal and vertical location.
- Never use points for visualization if you are doing time series encoding.
In this data-driven age, storytelling has evolved beyond the realms of fiction and imagination. Data storytelling has emerged as a powerful tool, bridging the gap between raw numbers and captivating narratives. Mastering this art can unlock new avenues of influence, decision-making, and success.
Sign up for the Certified AI & ML Program by Analytics Vidhya to elevate your data storytelling skills to new heights. Unleash the potential of industry examples, Excel, Python, and Tableau, and become a storytelling expert. Explore the program now!
Frequently Asked Questions
A. Data storytelling conveys data-driven insights and analysis through a compelling narrative that engages and informs the audience, making data more understandable and actionable.
A. The three key elements of data storytelling are data, visualization, and narrative. Data provides the foundation, visualization aids in comprehension, and a well-crafted narrative contextualizes and communicates the insights.
A. The four aspects of data storytelling are simplicity, relevance, credibility, and engagement. Effective data storytelling involves presenting information in a clear and understandable manner, connecting data to real-world situations, ensuring data accuracy, and captivating the audience’s attention.
A. An example of data storytelling could be a presentation that uses data and visualizations to explain the impact of a marketing campaign on sales, highlighting the key metrics, trends, and the story of how the campaign led to business growth.