KASHISH RAKESHKUMAR — October 17, 2021
Beginner Data Visualization Libraries Python

This article was published as a part of the Data Science Blogathon

Introduction

The waterfall chart is a 2-Dimensional chart that is specially used to understand the effects of incremental positive and negative changes over time or over multiple steps or a variable. The waterfall charts are also known as Floating Bricks Charts, Flying Bricks Charts.

When God takes a shower, waterfalls come to life around the world. When I take a shower, my hair usually clogs the drain.
Anthony T. Hincks

In this article, we will see the importance of waterfall charts and how to make them with the different libraries like Matplotlib, Plotly.

Waterfall Chart

The waterfall chart is frequently used in financial analysis to understand the positive and negative effects of multiple factors over a particular asset. The chart can show the effect based on either time based or category based. Category based charts represent gain or loss over expense or sales or any other variable having sequentially positive and negative values. Time based charts represent the gain or loss over the time period.

The waterfall chart is mostly in a horizontal manner. They start from the horizontal axis and are connected by a series of floating columns which are related to negative or positive comments. Sometimes the bars are connected with lines in the charts.

Why do we need it

Let’s take an example to understand when and where to use waterfall charts because making waterfall charts is not a big problem. We will take some dummy data and the Kaggle dataset to build a waterfall chart.

Let take an example

If I give you a table in pandas not a normal one but a stylish one and a waterfall chart, which one is more convenient to read? Tell me?

This table represents the data for the sales for the whole one week and I have used the seaborn library to create heatmaps with the background_gradient

import seaborn as sns
# data
a = ['mon','tue','wen','thu','fri','sat','sun']
b = [10,-30,-7.5,-25,95,-7,45]
df2 = pd.DataFrame(b,a).reset_index().rename(columns={'index':'values',0:'week'})
# table
cm = sns.light_palette("green", as_cmap=True)
df2.style.background_gradient(cmap=cm)
Waterfall Chart data

Now, look at the table and waterfall chart side by side.

Waterfall Chart weekly sales

The table is showing the importance of values in order but it is quite difficult to read the values. But on the other hand, you can easily see that the yellow bar shows the decrement and the red bar shows the incremernt.

Waterfall chart with Plotly

The data which we are going to use it is taken from Kaggle of Netflix Movies and TV Shows the notebook can be found here.

We are going to use Plotly, an open source charting library.

Importing the library

import plotly.graph_objects as go

Dataset

df = pd.read_csv(r'D:/netflix_titles.csv')

Adding year and month and converting into proper date time format

df["date_added"] = pd.to_datetime(df['date_added'])
df['year_added'] = df['date_added'].dt.year
df['month_added'] = df['date_added'].dt.month
df.head(3)

Let’s prepare the data

d2 = df[df["type"] == "Movie"]
col = "year_added"
vc2 = d2[col].value_counts().reset_index().rename(columns = {col : "count", "index" : col})
vc2['percent'] = vc2['count'].apply(lambda x : 100*x/sum(vc2['count']))
vc2 = vc2.sort_values(col)

Now we will make a waterfall chart with Plotly trace go.Waterfall(). Now we are going to make a waterfall chart for Movies over the years.

fig2 = go.Figure(go.Waterfall(
    name = "Movie", orientation = "v", 
    x = ["2008", "2009", "2010", "2011", "2012", "2013", "2014", "2015", "2016", "2017", "2018", "2019", "2020", "2021"],
    textposition = "auto",
    text = ["1", "2", "1", "13", "3", "6", "14", "48", "204", "743", "1121", "1366", "1228", "84"],
    y = [1, 2, -1, 13, -3, 6, 14, 48, 204, 743, 1121, 1366, -1228, -84],
    connector = {"line":{"color":"#b20710"}},
    increasing = {"marker":{"color":"#b20710"}},
    decreasing = {"marker":{"color":"orange"}},
))
Waterfall chart with Plotly

Let’s go through each parameter one by one:

  • x: The values which are going to be on the x-axis
  • y: The values which are going to be on the y-axis
  • text: The values which are going to be present on the charts
  • textposition: We can put the text inside the bars of the chart or above the bars of the charts

To make the charts elegant we will be giving the colors to bars of the charts and their connector line too. For increasing bars, I have given red color and for decreasing bars, it is yellow color.

The parameters for the charts

  • connector: Giving colors to the connector line
  • increasing: Giving colors to the increasing bars
  • decreasing: Giving colors to the decreasing bars

As we see the chart it looks pretty good but let’s make it more attractive.

fig2.update_xaxes(showgrid=False)
fig2.update_yaxes(showgrid=False, visible=False)
fig2.update_traces(hovertemplate=None)
fig2.update_layout(title='Watching Movies over the year', height=350,
                   margin=dict(t=80, b=20, l=50, r=50),
                   hovermode="x unified",
                   xaxis_title=' ', yaxis_title=" ",
                   plot_bgcolor='#333', paper_bgcolor='#333',
                   title_font=dict(size=25, color='#8a8d93', family="Lato, sans-serif"),
                   font=dict(color='#8a8d93'))
watching movies
Waterfall Chart in Plotly

Now it looks perfect.

Let’s look at the parameters now.

  • title: Title for the chart
  • margin: Setting the margin for the chart: top, bottom, left, right
  • plot_bgcolor: Setting the plot background color
  • paper_bgcolor: Setting the paper background color
  • font: Setting the font properties
  • title_font: Setting the title font properties
  • I have hide the y-axis because by using update_yaxes(visible=False).

The Full code

d2 = df[df["type"] == "Movie"]
col = "year_added"
vc2 = d2[col].value_counts().reset_index().rename(columns = {col : "count", "index" : col})
vc2['percent'] = vc2['count'].apply(lambda x : 100*x/sum(vc2['count']))
vc2 = vc2.sort_values(col)
fig2 = go.Figure(go.Waterfall(
    name = "Movie", orientation = "v", 
    x = ["2008", "2009", "2010", "2011", "2012", "2013", "2014", "2015", "2016", "2017", "2018", "2019", "2020", "2021"],
    textposition = "auto",
    text = ["1", "2", "1", "13", "3", "6", "14", "48", "204", "743", "1121", "1366", "1228", "84"],
    y = [1, 2, -1, 13, -3, 6, 14, 48, 204, 743, 1121, 1366, -1228, -84],
    connector = {"line":{"color":"#b20710"}},
    increasing = {"marker":{"color":"#b20710"}},
    decreasing = {"marker":{"color":"orange"}},
))
fig2.update_xaxes(showgrid=False)
fig2.update_yaxes(showgrid=False, visible=False)
fig2.update_traces(hovertemplate=None)
fig2.update_layout(title='Watching Movies over the year', height=350,
                   margin=dict(t=80, b=20, l=50, r=50),
                   hovermode="x unified",
                   xaxis_title=' ', yaxis_title=" ",
                   plot_bgcolor='#333', paper_bgcolor='#333',
                   title_font=dict(size=25, color='#8a8d93', family="Lato, sans-serif"),
                   font=dict(color='#8a8d93'))

 

Waterfall chart with Matplotlib

Importing the waterfallcharts library using pip

!pip install waterfallcharts

Importing the library

import pandas as pd
import waterfall_chart
import matplotlib.pyplot as plt
%matplotlib inline

Let’s plot a waterfall chart for Each week’s sales data.

a = ['mon','tue','wen','thu','fri','sat','sun']
b = [10,-30,-7.5,-25,95,-7,45]
waterfall_chart.plot(a, b);
Waterfall chart in Matplotlib 
Waterfall chart in Matplotlib 

If we look closely at the charts the bars having positive values are in green, negative values are in red and total value is in blue by default.

Adding some parameters to the chart

waterfall_chart.plot(a, b, net_label='Total', rotation_value=360)

parameters of the chart:

  • net_label: At the last bar we can change the name of the bar by net_label
  • rotation_value: Rotating and setting the value of the x-axis
matplotlib

 

End Notes

We saw the importance of the waterfall chart: when and how to use it with Plotly and Matploib. I hope you liked the article and if you have any queries you can contact me on

LinkedIn | KaggleMedium | Analytics Vidhya

The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.

About the Author

KASHISH RAKESHKUMAR

A student who is learning and sharing with a storyteller to make your life easy.

Our Top Authors

Download Analytics Vidhya App for the Latest blog/Article

Leave a Reply Your email address will not be published. Required fields are marked *