Kashish Rastogi — September 22, 2021
Beginner Data Exploration Data Visualization Guide Python
This article was published as a part of the Data Science Blogathon
Bokeh
Image 1 

Introduction

I am sure many of you have read several articles around the world stating the buzz around “Machine Learning, “Data Scientist”, “Data Visualization” and so on. Some have branded data science as the sexiest job of the 21st century. A report stated by Anaconda’s State of Data Science Report 2020  that 21% of the time is given to Data Visualization. It is important to use a tool or library to help us in the flow of storytelling.

Data visualization is one of the most basic and important steps in predictive modelling. People often start with data visualization to gain more insights and try to understand the data by doing Exploratory Data Analysis (EDA). Making charts and visuals is a better option rather than studying the tables and values as people love visuals rather than boring text or values.  So, Let’s make clear, elegant, and insightful charts that our audience can understand easily, considered the audience as a non-technical person always. Less is more impactful, proper visualization brings clarity of data which helps in decision-making. let’s see a quick guide that helps in bokeh visualization.

 

 

“Even if your role does not directly involve the nuts and bolts of data science, it is useful to know what data visualization can do and how it is realized in the real world.”
– Ramie Jacobson

 

data science lifecycle

What is Bokeh?

Bokeh is an interactive visualization library in python. The best feature which bokeh provides is highly interactive graphs and plots that target modern web browsers for presentations. Bokeh helps us to make elegant, and concise charts with a wide range of various charts.

Bokeh primarily focuses on converting the data source into JSON format which then uses as input for BokehJS. Some of the best features of Bokeh are:

  • Flexibility: Bokeh provides simple charts and customs charts too for complex use-cases.
  • Productivity: Bokeh has an easily compatible nature and can work with Pandas and Jupyter notebooks.
  • Styling: We have control of our chart and we can easily modify the charts by using custom Javascript.
  • Open source: Bokeh provide a large number of examples and idea to start with and it is distributed under Berkeley Source Distribution (BSD) license.

With bokeh, we can easily visualize large data and create different charts in an attractive and elegant manner.

Where to use Bokeh charts?

There are plenty of visualization libraries why do we need to use bokeh only? Let’s see why.

We can use the bokeh library to embed the charts on the web page. With bokeh, we can embed the charts on the web, make a live dashboard, and apps. Bokeh provides its own styling option and widgets for the charts. This is the advantage of embedding the bokeh charts on and website using Flask or Django.

Mainly bokeh provides two interface levels that are simple and we can adapt easily.

  • Bokeh Models
  • Bokeh Plottings
  • Bokeh Applications
  • Bokeh Server

bokeh.models

Bokeh models provide a low-level interface that provides high-end flexibility to the application developers

bokeh.plotting

Bokeh plotting provides a high-level interface for creating visuals glyphs. Bokeh plotting is a subclass of bokeh.models module. It contains the definition of figure class; figure class is the simplest plot creation.

bokeh.application

Bokeh application package which is used to create bokeh documents; is a lightweight factory.

bokeh.server

Bokeh server is used to publish and share interactive charts and apps.

 

Installing the Bokeh Library

Installation the bokeh library with pip, run the following command

pip install pandas-bokeh

Installing the bokeh library for the conda environment, run the following command

conda install -c patrikhlobil pandas-bokeh

Importing the Bokeh library

Importing necessary packages for bokeh library

import pandas as pd
import pandas_bokeh
from bokeh.io import show, output_notebook
from bokeh.plotting import figure
pandas_bokeh.output_notebook()
pd.set_option('plotting.backend', 'pandas_bokeh')

Bokeh plotting is an interface for creating interactive visuals which we import from the figure  that acts as a container that holds our charts.

from bokeh.plotting import figure

We need the below command to display the charts

from bokeh.io import show, output_notebook

We need the below command to display the output of the charts in the jupyter notebook

pandas_bokeh.output_notebook()

To embed the charts as HTML, run the below command

pandas_bokeh.output_file(filename)

Hovertool is used to display the value when we hover over the data using a mouse pointer and ColumnDataSource is the Bokeh version of DataFrame

from bokeh.models import HoverTool, ColumnDataSource

The syntax for plotting the charts

Using a pandas bokeh

Now, to use the bokeh plotting library for a pandas data frame via the following code.

dataframe.plot_bokeh()

Creating a Figure object for Bokeh

We will create a figure object which is nothing other than a container that holds the chart. We can give any name to the figure() object, here we have given fig.
fig = figure()
'''
Customizing code for plotting
'''
show(fig)

Creating a chart with ColumnDataSource

To use a ColumnDataSource with a render function, we need to at least pass 3 arguments:

  • x – the name of the ColumnDataSource’s column that contains data for the x-axis of the chart
  • y – the name of the ColumnDataSource’s column that contains data for the y-axis of the chart
  • source – the name of the ColumnDataSource’s column that contains data which we have referenced for x-axis and y-axis

Want to show the output chart in a separate HTML file, run the following command

output_file('abc.html')

Using the Bokeh Library themes

Bokeh themes have a predefined set of designs that you can apply to your plots. Bokeh gives five built-in themes;

  • caliber,
  • dark_minimal,
  • light_minimal,
  • night_sky, and
  • contrast.

The below picture shows how the charts will look in the build-in themes. Here I have taken a line chart with different themes.

Run the below code for plotting charts using built-in themes.

Bokeh themes

Styling the charts

To enhance the charts there are different properties which we can use. The three main groups of properties that objects have in common:

  • line properties
  • fill properties
  • text properties

Basic styling 

I will be adding only the required code for customizing the chart, you can add the code as per the requirement. In the end, I will be showing the chart with the demo code for a clear understanding. Well, there are many more properties for detailed explanation see the official documentation.

Adding background color to the chart

fig = figure(background_fill_color="#fafafa")

 

Setting the value of chart width and height we need to add height and width in figure()

fig = figure(height=350, width=500)

Hiding the x-axis and y-axis of the chart

fig.axis.visible=False

Hiding the grid colors of the chart

fig.grid.grid_line_color = None

To change the intensity of the color of the chart we use alpha

fig.background_fill_alpha=0.3

To add a title to the chart we need to add a title in the figure()

fig = figure(title="abc")

To add or change the x-axis and y-axis labels, run the following command

fig.xaxis.axis_label='X-axis'
fig.yaxis.axis_label='Y-axis'

Demo chart for simple stylings

x = list(range(11))

y0 = x

fig = figure(width=500, height=250,

title='Title',

background_fill_color="#fafafa")

fig.circle(x, y0, size=12, color="#53777a", alpha=0.8)

fig.grid.grid_line_color = None

fig.xaxis.axis_label='X-axis'

fig.yaxis.axis_label='Y-axis'

show(fig)

 

demo chart

Steps to create a chart with bokeh.plotting interface is:

steps to create bokeh plot

Image 2

  • Prepare the data
  • Create a new plot
  • Add renders for your data, with your visual customization for the plot
  • Specify where to generate the output (In HTML file or in Jupyter Notebook)
  •  Show the result

Bokeh Use Case in Python

The data which we are going to work on is Among Us most famous dataset, you can find the dataset on kaggle.

Among Us: New craze

 

among us

Among Us is the new craze for people playing mobile games that has suddenly exploded it’s popularity and become the hit video game in the pandemic. To all the among Us fans here is a brief description of how the game works. Among Us is a multiplayer game where four and ten players are dropped in an alien spaceship. Each player has its own role of Imposter or Crewmate; the task of crewmate is to run around the spaceship to complete all the tasks which are assigned and also to take care of not being killed by an imposter.  Players can be voted off the ship, so each game becomes one of survival.

Data

let’s load the data and create one more feature User ID; user id will tell us which user it is like user 1, user 2, etc.

import glob
path = r'D:BlogsAnalytics_vidhyaAmong_Us' 
all_files = glob.glob(path + "/*.csv")
li = []
usr=0
for filename in all_files:
    usr+=1
    df = pd.read_csv(filename, index_col=None, header=0)
    df['User ID']=usr
    li.append(df)
df = pd.concat(li, axis=0, ignore_index=True)
df[:2]
dataset

 

Data Description

  • Game Completed Date – Date and Time of game completion
  • Team – Tell us if the player is an imposter or crewmate
  • Outcome – Tell us if the game is woned/losses
  • Task Completed – The number of tasks completed by the crewmate
  • All Tasks Completed – Boolean Variable showcasing if all the tasks are completed by the crewmate
  • Murdered – Crewmate is murdered or not
  • Imposter Kills – Number of kills by the imposter
  • Game Length – Total duration of the game
  • Ejected – The player is ejected by the fellow players or not
  • Sabotages Fixed – Number of sabotages fixed by the crewmates
  • Time to complete all tasks – Duration taken by the crewmates to complete the tasks
  • Rank Change – Change in rank after the win/loss of the game
  • Region/Game Code – Server and the game code
  • User ID – Number of Users.

Note: This article doesn’t contain the EDA but shows how to work with different charts in Bokeh

Let’s see the distribution of data.

df.describe(include='O')
describe data

We will create a feature Minute and extract data from Game Lenght.

df['Min'] = df.apply(lambda x : x['Game Length'].split(" ")[0] , axis = 1)
df['Min'] = df['Min'].replace('m', '', regex=True)
df['Min'][:2]

Now, we will replace the values of Murdered features

df['Murdered'].replace(['No', 'Yes', '-'], ['Not Murdered', 'Murdered', 'Missing'],inplace=True)

After completing the necessary steps for cleaning. First, let’s see the basic charts in bokeh.

 

Pie chart

Let’s check if there are more number of Crewmates or Imposter in the game we have data of a total of 2227 people.

df_team = df.Team.value_counts()

df_team.plot_bokeh(kind='pie', title='Ration of Mposter vs Crewmate')

 

Pie chart in Bokeh

Pie chart in Bokeh

Interpret

As shown in the chart there are 79% of Cremates and 21% of Imposters, this shows that the ratio of Imposter: Crewmates is 1:4. Well, the imposters are less so there are chances that most of the game is won.

Donut Chart

Let’s check if there are more Crewmates or Imposter Murdered or not in the game. We will add two more features Angle and Color which we are going to use in the chart. This donut chart was taken from

from math import pi

df_mur = df.Murdered.value_counts().reset_index().rename(columns={'index': 'Murdered', 'Murdered': 'Value'})

df_mur['Angle'] = df_mur['Value']/df_mur['Value'].sum() * 2*pi

df_mur['Color'] = ['#3182bd', '#6baed6', '#9ecae1']

df_mur

 

data

We will use annular_wedge() to make a donut chart

from bokeh.transform import cumsum

fig = figure(plot_height=350, title="Ration of Murdered vs Not Murdered", toolbar_location=None,

tools="hover", tooltips="@Murdered: @Value", x_range=(-.5, .5))

fig.annular_wedge(x=0, y=1, inner_radius=0.15, outer_radius=0.25, direction="anticlock",

start_angle=cumsum('Angle', include_zero=True), end_angle=cumsum('Angle'),

line_color="white", fill_color='Color', legend='Murdered', source=df_mur)

fig.axis.axis_label=None

fig.axis.visible=False

fig.grid.grid_line_color = None

show(fig)

 

donut chart in Bokeh

Interpret

Most of the people were murdered during the game but most of the data is missing. So we can’t say that the majority of people were murdered during the game.

Scatter Chart

First, we will create a data frame of Sabotages fixed and Minutes and change the column names and add T in them.

df_min = pd.crosstab(df['Min'], df['Sabotages Fixed']).reset_index()
df_min = df_min.rename(columns={0.0:'0T', 1.0:'1T',
                       2.0:'2T',3.0:'3T',4.0:'4T',5.0:'5T'
                    })
df_min[:2]
data
Let’s just take 3 sabotages fixed 0,1 and 2 and create a data frame.
df_0 = df_min[['Min', '0T']]
df_1 = df_min[['Min', '1T']]
df_2 = df_min[['Min', '2T']]

To make a simple scatter plot with only one legend we can pass the data and use scatter() it to make the charts.

df_min.plot_bokeh.scatter(x='Min', y='1T')
Scatter Chart in Bokeh

Scatter Chart in Bokeh

To make a scatter chart with more than one legend we need to use the circle; which is a method of figure object. The circle is one of the many plotting styles provided by bokeh you can use a triangle or many more.

fig = figure(title='Sabotages Fixed vs Minutes', 
             tools= 'hover', 
             toolbar_location="above", 
             toolbar_sticky=False)
fig.circle(x="Min",y='0T', 
         size=12, alpha=0.5, 
         color="#F78888", 
         legend_label='0T', 
         source=df_0),
fig.circle(x="Min",y='1T', 
         size=12, alpha=0.5, 
         color="blue", 
         legend_label='1T', 
         source=df_1),
fig.circle(x="Min",y='2T', 
         size=12, alpha=0.5, 
         color="#626262", 
         legend_label='2T', 
         source=df_2),
show(fig)
Scatter Chart in Bokeh

Scatter Chart in Bokeh

Simple Histogram Chart

Let’s see the distribution of the Minutes of the among Us game. We will use hist to plot a histogram.

df_minutes = df['Min'].astype('int64')
df_minutes.plot_bokeh(kind='hist', title='Distribution of Minutes')
Histogram in Bokeh

Histogram in Bokeh

 

Interpret

Most of the games have a period of 6 minutes to 14 minutes.

Stacked Histogram Chart

Let’s see if the game length increases so the imposters and crewmates decrease or increases. We will use a hist a to make a stacked histogram.

df_gm_te = pd.crosstab(df['Game Length'], df['Team'])
data

code

df_gm_te.plot_bokeh.hist(title='Gamelegth vs Imposter/Crewmate', figsize=(750, 350))
Stacked Histogram in Bokeh

Stacked Histogram in Bokeh

Interpret

Imposters don’t tend to play the game for a longer time they just want to kill all the cremates and win the game.

Different types of Bar charts

Simple Bar Charts using Bokeh

Let’s see if the given task is completed by the people or not. If all the tasks are completed then automatically cremates will win.

df_tc = pd.DataFrame(df['Task Completed'].value_counts())[1:].sort_index().rename(columns={'Task Completed': 'Count'})
df_tc.plot_bokeh(kind='bar', y='Count', title='How many people have completed given task?', figsize=(750, 350))
Bar chart

Bar chart in Bokeh

Interpret

The most task completed is 7 and the least completed tasks are 10.

Stacked Bar Chart

Let’s see who wins: Imposter or Cremate. I have always felt that Imposters won most because they have only one job to kill everyone.

df1 = pd.crosstab(df[‘Team’], df[‘Outcome’])
df1.plot_bokeh.bar(title=’Who wins: Imposter or Crewmates’,
stacked=True,
figsize=(450, 350))

 

stacked bar chart

Stacked Bar Chart in Bokeh

 

Interpret

Imposters are won more often than Crewmates. There is not much difference for Imposter to win or lose the match the values are pretty close.  There would be many cases where they have 5 cremates and 4 imposters.

Stacked Vertical Bar chart

Completing the task will win the game or not let’s see.

df['All Tasks Completed'].replace(['Yes','No'], ['Tasks Completed','Tasks Not Completed'], inplace=True)

df2 = pd.crosstab(df['Outcome'], df['All Tasks Completed'])

df2.plot_bokeh.barh(title='Completeing task: win or loss',

stacked=True,

figsize=(650, 350))

Stacked Bar chart in Bokeh

Stacked Bar chart in Bokeh

 

Interpret

Finishing the task will automatically win the cremates. There is a higher number of people who completed the task to win the game.

Bi-directional Bar Chart

Let’s see if the Users are Won or Defated with a bi-directional bar chart. To make a bi-directional bar chart we need to make one measure negative, here we will make the Loss feature negative.

df_user = pd.crosstab(df['User ID'], df['Outcome']).reset_index()
df_user['Loss'] = df_user['Loss']*-1
df_user['User ID'] = (df_user.index+1).astype(str) + ' User'
df_user = df_user.set_index('User ID')
df_user[:2]
data

After completing the above process now, we just need to use barh() to make a bar chart in both the direction.

df_user.plot_bokeh.barh(title='Users: Won or Defeat')
Bi-directional Bar chart in bokeh

 

Interpret

From the chart, we can easily differentiate if the user is Defeated or Won the game.

Line Chart

Let’s see the ejected ratio of cremated from the game. We will use line to make a line chart.

df_crewmate = df[df['Team'] == 'Crewmate']
df_t_ej = pd.crosstab(df_crewmate['User ID'], df_crewmate['Ejected']).reset_index()
df_t_ej = df_t_ej[['No','Yes']]
df_t_ej.plot_bokeh.line(title='Cremates Memebers: Ejected vs Minutes', figsize=(750, 350))

 

Line Chart in Bokeh

Line Chart in Bokeh

 

Interpret

There is a high variance in members not being ejected from the game.

Lollipop Chart using Bokeh

Let’s visualize the charts for Top 10 Users who win. I have added a user string in all the user id. The data frame looks like this.

df_user_new = pd.crosstab(df['User ID'], df['Outcome']).reset_index().sort_values(by='Win', ascending=False)[:10]
df_user_new['User ID'] = (df_user_new.index+1).astype(str) + ' User'
df_user_new[:2]
data

In this chart, we will remove x-axis and y-axis grid lines from the chart. For making a lollipop chart we need to combine segment() and circle().

x = df_user_new['Win']

factors = df_user_new['User ID'] #.values

fig = figure(title="Top 10 Users: Win", toolbar_location=None,

tools="hover", tooltips="@x",

y_range=factors, x_range=[0,75],

plot_width=750, plot_height=350)

fig.segment(0, factors, x, factors, line_width=2, line_color="#3182bd")

fig.circle(x, factors, size=15, fill_color="#9ecae1", line_color="#3182bd", line_width=3)

fig.xgrid.grid_line_color = None

fig.ygrid.grid_line_color = None

show(fig)

 

Lollipop Chart in Bokeh

Area chart using Bokeh

Let’s take a look at how many sabotages were fixed over the time period (Minutes). For the simplicity purpose here, we are going to see only two sabotages 0th and 1st.

from bokeh.models import ColumnDataSource

from bokeh.plotting import figure, output_file, show

# data

df_min = pd.crosstab(df['Min'], df['Sabotages Fixed']).reset_index()

df_min = df_min.rename(columns={0.0:'0T', 1.0:'1T',

2.0:'2T',3.0:'3T',4.0:'4T',5.0:'5T'

})

# chart

names = ['0T','1T']

source = ColumnDataSource(data=dict(

x = df_min.Min,

y0 = df_min['0T'],

y1 = df_min['1T']

))

fig = figure(width=400, height=400, title='Sabotages Fied vs Minutes')

fig.varea_stack(['y0','y1'], x='x', color=("grey", "lightgrey"),legend_label=names, source=source)

fig.grid.grid_line_color = None

fig.xaxis.axis_label='Minutes'

show(fig)


Area Chart in Bokeh

 

Interpret

As time increases the sabotages are fixed less.

Till now we saw all the basic charts in bokeh, now let’s see how to work with layouts in bokeh. This will helps us to create a dashboard or an application. So we can have all the information in one place for a particular use case.

Layout Function using Bokeh Library

The Layout function will let us build a grid of plots and widgets. We can have as many rows and columns or grids of plots in one layout.

There are many layout options available:

  • If you want to display the plots vertically, use the column() function.
  • If you want to display the plots horizontally, use the row() function.
  • If you want the plots in a grid fashion, use the gridplot() function.
  • Use layout() function, if you want the charts to be laying in a best possible way

Let’s take a dummy data

from bokeh.io import output_file, show
from bokeh.layouts import row
from bokeh.plotting import figure
output_file("layout.html")
x = list(range(11))
y0 = x
y1 = [10 - i for i in x]
y2 = [abs(i - 5) for i in x]
# create three plots
s1 = figure(width=250, height=250, background_fill_color="#fafafa")
s1.circle(x, y0, size=12, color="#53777a", alpha=0.8)
s2 = figure(width=250, height=250, background_fill_color="#fafafa")
s2.triangle(x, y1, size=12, color="#c02942", alpha=0.8)
s3 = figure(width=250, height=250, background_fill_color="#fafafa")
s3.square(x, y2, size=12, color="#d95b43", alpha=0.8)

If we use the column() function the output will look like this.

show(column(s1, s2, s3))
show column

If we use the row() function the output will look like this

# put the results in a row and show
show(row(s1, s2, s3))
show row

Let’s make a Dashboard layout in Bokeh. Here I have taken three charts one is a lollipop chart, another two are pie charts in bokeh.

The main logic to set a layout in bokeh is how we want to set the charts. Let’s create a design like given in the below picture.

grid

 

layout = grid([
                [fig1],
                [fig2, fig3]
       ])

The whole code to run a Dashboard Layout in Bokeh

from bokeh.io import output_file, show
from bokeh.plotting import figure
from bokeh.layouts import column, grid
# 1 layout
df_user_new = pd.crosstab(df['User ID'], df['Outcome']).reset_index().sort_values(by='Win', ascending=False)[:10]
df_user_new['User ID'] = (df_user_new.index+1).astype(str) + ' User'
x = df_user_new['Win']
factors = df_user_new['User ID'] 
fig1 = figure(title="Top 10 Users: Win", toolbar_location=None,
              tools="hover", tooltips="@x",
              y_range=factors, x_range=[0,75], 
              width=700, height=250)
fig1.segment(0, factors, x, factors, line_width=2, line_color="#3182bd")
fig1.circle(x, factors, size=15, fill_color="#9ecae1", line_color="#3182bd", line_width=3)
# 2 layout
df_mur = df.Murdered.value_counts().reset_index().rename(columns={'index': 'Murdered', 'Murdered': 'Value'})
df_mur['Angle'] = df_mur['Value']/df_mur['Value'].sum() * 2*pi
df_mur['Color'] = ['#3182bd', '#6baed6', '#9ecae1']
fig2 = figure(height=300,width=400, title="Ration of Murdered vs Not Murdered", 
              toolbar_location=None, tools="hover", tooltips="@Murdered: @Value", x_range=(-.5, .5))
fig2.annular_wedge(x=0, y=1,  inner_radius=0.15, outer_radius=0.25, direction="anticlock",
                   start_angle=cumsum('Angle', include_zero=True), end_angle=cumsum('Angle'),
                   line_color="white", fill_color='Color', legend_label='Murdered', source=df_mur)
# 3 layout
df_team = pd.DataFrame(df.Team.value_counts()).reset_index().rename(columns={'index': 'Team', 'Team': 'Value'})
df_team['Angle'] = df_team['Value']/df_team['Value'].sum() * 2*pi
df_team['Color'] = ['#3182bd', '#6baed6']
fig3 = figure(height=300, width=300, title="Ration of Cremates vs Imposter",  
              toolbar_location=None, tools="hover", tooltips="@Team: @Value", x_range=(-.5, .5))
fig3.annular_wedge(x=0, y=1,  inner_radius=0.15, outer_radius=0.25, direction="anticlock",
                   start_angle=cumsum('Angle', include_zero=True), end_angle=cumsum('Angle'),
                   line_color="white", fill_color='Color', legend_label='Team', source=df_team)
# Styling
for fig in [fig1, fig2, fig3]:
        fig.grid.grid_line_color = None
for fig in [fig2, fig3]:
        fig.axis.visible=False
        fig.axis.axis_label=None
layout = grid([
                [fig1],
                [fig2, fig3]
       ])
show(layout)
Dashboard in Bokeh

End Notes

In this article, we saw what is bokeh and how to work with different charts, from simple to advanced. We saw how to work around charts in the layout too.

References:

Image 1: https://bokeh.org/branding/

Image 2: https://www.datacamp.com/community/blog/bokeh-cheat-sheet-python

 

The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.

About the Author

Our Top Authors

  • Analytics Vidhya
  • Guest Blog
  • Tavish Srivastava
  • Aishwarya Singh
  • Ram Dewani
  • Faizan Shaikh
  • Aniruddha Bhandari

Download Analytics Vidhya App for the Latest blog/Article

Leave a Reply Your email address will not be published. Required fields are marked *