Sameer Mahajan — December 21, 2021
Beginner Data Visualization Python

This article was published as a part of the Data Science Blogathon

Introduction

When data is collected, there is a need to interpret and analyze it to provide insight into it. This insight can be about patterns, trends, or relationships between variables. Data interpretation is the process of reviewing data through well-defined methods. They help assign meaning to the data and arrive at a relevant conclusion. The analysis is the process of ordering, categorizing, and summarizing data to answer research questions. It should be done quickly and effectively. The results need to stand out and should be right in your face. Data Plot types for Visualization is an important aspect of this end. With growing data, this need is growing and hence data plots become very important in today’s world. However, there are many types of plots used in data visualization. It is often tricky to choose which type is best for your business or data. Each of these plots has its strengths and weaknesses that make it better than others in some situations.

This article provides a comprehensive list of data plots and their further subtypes. It discusses which one is right for the given problem.

Several packages can be used for this purpose. Popular packages widely used for this purpose are plotly and seaborn. This article will look at code that draws these plots in plotly and seaborn / matplotlib. The visual representation of these plots is given here for understanding. The code used in this article to generate plots and corresponding generated visual plots is posted on GitHub at: https://github.com/sameermahajan/MLWorkshop/tree/master/13.%20Visualization

These data plot types for visualization are sometimes called graphs or charts depending on the context.

Bar Graph

A bar graph is a graph that presents categorical data with rectangle-shaped bars. The heights or lengths of these bars are proportional to the values that they represent. The bars can be vertical or horizontal. A vertical bar graph is sometimes called a column graph.

Following is an illustration of a bar graph indicating the population in Canada by years.

Bar Graph | Data Plot Types for Visualisation

Following is the code indicating how to do it in plotly.

import plotly.express as px
data_canada = px.data.gapminder().query("country == 'Canada'")
fig = px.bar(data_canada, x='year', y='pop')
fig.show()

Following is the representational code of doing it in seaborn.

import seaborn as sns
sns.set_theme(style="whitegrid")
ax = sns.barplot(x="year", y="pop", data=data_canada)

This is how it looks:

Data Plot Types for Visualisation

The following are types of bar graphs:

Grouped Bar Graph

Grouped bar graphs are used when the datasets have subgroups that need to be visualized on the graph. The subgroups are differentiated by distinct colours. Here is an illustration of such a graph:

Grouped Bar Graph | Data Plot Types for Visualisation

Here is a code snippet on how to do it in plotly:

import plotly.express as px
df = px.data.tips()
fig = px.bar(df, x="sex", y="total_bill", color="time")
fig.show()

Here is a code snippet on how to do it in seaborn:

import seaborn as sb
df = sb.load_dataset('tips')
df = df.groupby(['size', 'sex']).agg(mean_total_bill=("total_bill", 'mean'))
df = df.reset_index()
sb.barplot(x="size", y="mean_total_bill", hue="sex", data=df)

Stacked Bar Graph

The stacked bar graphs are used to show dataset subgroups. However, the bars are stacked on top of each other. Here is an illustration:

Stacked Bar Graph

Here is a code snippet on how to do it in plotly:

import plotly.express as px
df = px.data.tips()
fig = px.bar(df, x="sex", y="total_bill", color='time')
fig.show()

Seaborn code snippet:

import pandas
import matplotlib.pylab as plt
import seaborn as sns
plt.rcParams["figure.figsize"] = [7.00, 3.50]
plt.rcParams["figure.autolayout"] = True
df = pandas.DataFrame(dict(
   number=[2, 5, 1, 6, 3],
   count=[56, 21, 34, 36, 12],
   select=[29, 13, 17, 21, 8]
))
bar_plot1 = sns.barplot(x='number', y='count', data=df, label="count", color="red")
bar_plot2 = sns.barplot(x='number', y='select', data=df, label="select", color="green")
plt.legend(ncol=2, loc="upper right", frameon=True)
plt.show()

Segmented Bar Graph

This is the type of stacked bar graph where each stacked bar shows the percentage of its discrete value from the total value. The total percentage is 100%. Here is an illustration:

Segmented Bar Graph

 

Line Graph

It displays a sequence of data points as markers. The points are ordered typically by their x-axis value. These points are joined with straight line segments. A line graph is used to visualize a trend in data over intervals of time.

The following is an illustration of Canadian life expectancy by years in Line Graph.

Line Graph | Data Plot Types for Visualisation

Here is how to do it in plotly:

import plotly.express as px
df = px.data.gapminder().query("country=='Canada'")
fig = px.line(df, x="year", y="lifeExp", title='Life expectancy in Canada')
fig.show()

Here is how to do it in seabron:

import seaborn as sns
sns.lineplot(data=df, x="year", y="lifeExp")

Here are types of line graphs:

Simple Line Graph

A simple line graph plots only one line on the graph. One of the axes defines the independent variable. The other axis contains a variable that depends on it.

Multiple Line Graph

Multiple line graphs contain more than one line. They represent multiple variables in a dataset. This type of graph can be used to study more than one variable over the same period.

It can be drawn in plotly as:

import plotly.express as px
df = px.data.gapminder().query("continent == 'Oceania'")
fig = px.line(df, x='year', y='lifeExp', color='country', symbol="country")
fig.show()

Here is the illustration:

Multiple Line Graph

In seaborn as:

import seaborn as sns
sns.lineplot(data=df, x='year', y='lifeExp', hue='country')

Here is the illustration:

Multiple Line Graph 2

 

Compound Line Graph

It is an extension of a simple line graph. It is used when dealing with different groups of data from a larger dataset. Its every line graph is shaded downwards to the x-axis. It has each group stacked upon one another.

Here is an illustration:

Compound Line Graph

 

Pie Chart

A pie chart is a circular statistical graphic. To illustrate numerical proportion, it is divided into slices. In a pie chart, for every slice, each of its arc lengths is proportional to the amount it represents. The central angles, and area are also proportional. It is named after a sliced pie.

Here is how to do it in plotly:

import plotly.express as px
df = px.data.gapminder().query("year == 2007").query("continent == 'Europe'")
df.loc[df['pop'] < 2.e6, 'country'] = 'Other countries' # Represent only large countries
fig = px.pie(df, values='pop', names='country', title='Population of European continent')
fig.show()

And here is how it looks:

Pie Chart | Data Plot Types for Visualisation

Seaborn doesn’t have a default function to create pie charts, but the following syntax in matplotlib can be used to create a pie chart and add a seaborn color palette:

import matplotlib.pyplot as plt
import seaborn as sns

data = [15, 25, 25, 30, 5]
labels = ['Group 1', 'Group 2', 'Group 3', 'Group 4', 'Group 5']

colors = sns.color_palette('pastel')[0:5]

plt.pie(data, labels = labels, colors = colors, autopct='%.0f%%')
plt.show()

This is how it looks:

Piechart

These are types of pie charts:

Simple Pie Chart

This is the basic type of pie chart. It is often called just a pie chart.

Exploded Pie Chart

One or more sectors of the chart are separated (termed as exploded) from the chart in an exploded pie chart. It is used to emphasize a particular element in the data set.

This is a way to do it in plotly:

import plotly.graph_objects as go

labels = ['Oxygen','Hydrogen','Carbon_Dioxide','Nitrogen']
values = [4500, 2500, 1053, 500]

# pull is given as a fraction of the pie radius
fig = go.Figure(data=[go.Pie(labels=labels, values=values, pull=[0, 0, 0.2, 0])])
fig.show()

And this is how it looks:

Exploaded Pie Chart | Data Plot Types for Visualisation

In seaborn the explode attribute of the pie method in matplotlib can be used as:

import matplotlib.pyplot as plt
import seaborn as sns

data = [15, 25, 25, 30, 5]
labels = ['Group 1', 'Group 2', 'Group 3', 'Group 4', 'Group 5']

colors = sns.color_palette('pastel')[0:5]

plt.pie(data, labels = labels, colors = colors, autopct='%.0f%%', explode = [0, 0, 0, 0.2, 0])
plt.show()

Donut Chart

In this pie chart, there is a hole in the centre. The hole makes it look like a donut from which it derives its name.

The way to do it in plotly is:

import plotly.graph_objects as go

labels = ['Oxygen','Hydrogen','Carbon_Dioxide','Nitrogen']
values = [4500, 2500, 1053, 500]

# Use `hole` to create a donut-like pie chart
fig = go.Figure(data=[go.Pie(labels=labels, values=values, hole=.3)])
fig.show()

And this is how it looks:

Donut Chart

This is how it is done in seaborn:

import numpy as np
import matplotlib.pyplot as plt
data = np.random.randint(20, 100, 6)
plt.pie(data)
circle = plt.Circle( (0,0), 0.7, color='white')
p=plt.gcf()
p.gca().add_artist(circle)
plt.show()

Pie of Pie

A pie of pie is a chart that generates an entirely new pie chart detailing a small sector of the existing pie chart. It can be used to reduce the clutter and emphasize a particular group of elements.

Here is an illustration:

Pie of Pie

 

Bar of Pie

This is similar to the pie of pie, except that a bar chart is what is generated.

Here is an illustration:

Bar of Pie

 

3D Pie Chart

This is a pie chart that is represented in a 3-dimensional space. Here is an illustration:

3D Pie Chart

The shadow attribute can be set to True for doing it in seaborn / matplotlib.

import matplotlib.pyplot as plt
labels = ['Python', 'C++', 'Ruby', 'Java']
sizes = [215, 130, 245, 210]
# Plot
plt.pie(sizes, labels=labels, 
        autopct='%1.1f%%', shadow=True, startangle=140)
plt.axis('equal')
plt.show()

Histogram

A histogram is an approximate representation of the distribution of numerical data. The data is divided into non-overlapping intervals called bins and buckets. A rectangle is erected over a bin whose height is proportional to the number of data points in the bin. Histograms give a feel of the density of the distribution of the underlying data.

Here is a visual:

Histogram

Plotly code:

import plotly.express as px
df = px.data.tips()
fig = px.histogram(df, x="total_bill")
fig.show()

Seaborn code:

import seaborn as sns
penguins = sns.load_dataset("penguins")
sns.histplot(data=penguins, x="flipper_length_mm")

It is classified into different parts depending on its distribution as below:

Normal Distribution

This chart is usually bell-shaped.

Bimodal Distribution

In this histogram, there are two groups of histogram charts that are of normal distribution. It is a result of combining two variables in a dataset.

Visualization:

Bimodal Distribution

Plotly code:

import plotly.express as px
df = px.data.tips()
fig = px.histogram(df, x="total_bill", y="tip", color="sex", marginal="rug",
                   hover_data=df.columns)
fig.show()

Seaborn:

import seaborn as sns
iris = sns.load_dataset("iris")
sns.kdeplot(data=iris)

Skewed Distribution

This is an asymmetric graph with an off-centre peak. The peak tends towards the beginning or end of the graph. A histogram can be said to be right or left-skewed depending on the direction where the peak tends towards.

Random Distribution

This histogram does not have a regular pattern. It produces multiple peaks. It can be called a multimodal distribution.

Edge Peak Distribution

This distribution is similar to that of a normal distribution, except for a large peak at one of its ends.

Comb Distribution

The comb distribution is like a comb. The height of rectangle-shaped bars is alternately tall and short.

Area Chart

It is represented by the area between the lines and the axis. The area is proportional to the amount it represents.

These are types of area charts:

Simple area Chart

IIn this chart, the coloured segments overlap each other. They are placed above each other.

Stacked Area Chart

In this chart, the coloured segments are stacked on top of one another. Thus they do not intersect.

100% Stacked area Chart

In this chart, the area occupied by each group of data is measured as a percentage of its amount from the total data. Usually, the vertical axis totals a hundred per cent.

3-D Area Chart

This chart is measured on a 3-dimensional space.

We will look at visual representation and code for the most common type below.

Visual:

3D Area Chart

Plotly:

import plotly.express as px
df = px.data.gapminder()
fig = px.area(df, x="year", y="pop", color="continent",
	      line_group="country")
fig.show()

Seaborn:

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

sns.set_theme()
 
df = pd.DataFrame({'period': [1, 2, 3, 4, 5, 6, 7, 8],
                   'team_A': [20, 12, 15, 14, 19, 23, 25, 29],
                   'team_B': [5, 7, 7, 9, 12, 9, 9, 4],
                   'team_C': [11, 8, 10, 6, 6, 5, 9, 12]})

plt.stackplot(df.period, df.team_A, df.team_B, df.team_C)

Dot Graph

A dot graph consists of data points plotted as dots on a graph.

There are two types of these:

The Wilkinson Dot Graph

In this dot graph, the local displacement is used to prevent the dots on the plot from overlapping.

Cleaveland Dot Graph

This is a scatterplot-like chart that displays data vertically in a single dimension.

Plotly code:

import plotly.express as px
df = px.data.medals_long()

fig = px.scatter(df, y="nation", x="count", color="medal", symbol="medal")
fig.update_traces(marker_size=10)
fig.show()

Visual:

Dot Graph

Seaborn:

import seaborn as sns
sns.set_theme(style="whitegrid")
tips = sns.load_dataset("tips")
ax = sns.stripplot(x="day", y="total_bill", data=tips)

Visual:

Dot Graph 2

 

Scatter Plot

It is a type of plot using Cartesian coordinates to display values for two variables for a set of data. It is displayed as a collection of points. Their position on the horizontal axis determines the value of one variable. The position on the vertical axis determines the value of the other variable. A scatter plot can be used when one variable can be controlled and the other variable depends on it. It can also be used when both continuous variables are independent.

Visual:

Scatter Plot | Data Plot Types for Visualisation

Plotly code:

import plotly.express as px
df = px.data.iris() # iris is a pandas DataFrame
fig = px.scatter(df, x="sepal_width", y="sepal_length")
fig.show()

Seaborn code:

import seaborn as sns
tips = sns.load_dataset("tips")
sns.scatterplot(data=tips, x="total_bill", y="tip")

According to the correlation of the data points, scatter plots are grouped into different types. These correlation types are listed below

Positive Correlation

In these types of plots, an increase in the independent variable indicates an increase in the variable that depends on it. A scatter plot can have a high or low positive correlation.

Negative Correlation

In these types of plots, an increase in the independent variable indicates a decrease in the variable that depends on it. A scatter plot can have a high or low negative correlation.

No Correlation

Two groups of data visualized on a scatter plot are said to not correlate if there is no clear correlation between them.

Bubble Chart

A bubble chart displays three attributes of data. They are represented by x location, y location, and size of the bubble.

Visualization:

Bubble Chart

Plotly code:

import plotly.express as px
df = px.data.gapminder()

fig = px.scatter(df.query("year==2007"), x="gdpPercap", y="lifeExp",
         size="pop", color="continent",
                 hover_name="country", log_x=True, size_max=60)
fig.show()

Seaborn code:

import matplotlib.pyplot as plt
import seaborn as sns
from gapminder import gapminder # import data set 

data = gapminder.loc[gapminder.year == 2007]
 
b = sns.scatterplot(data=data, x="gdpPercap", y="lifeExp", size="pop", legend=False, sizes=(20, 2000))

b.set(xscale="log")

plt.show()

Their categories into different types are based on the number of variables in the dataset, the type of visualized data, and the number of dimensions in them.

Simple Bubble Chart

It is the basic type of bubble chart and is equivalent to the normal bubble chart.

Labelled Bubble Chart

The bubbles on this bubble chart are labelled for easy identification. This is to deal with different groups of data.

The multivariable Bubble Chart

This chart has four dataset variables. The fourth variable is distinguished with a different colour.

Map Bubble Chart

It is used to illustrate data on a map.

3D Bubble Chart

This is a bubble chart designed in a 3-dimensional space. The bubbles here are spherical.

Radar Chart

It is a graphic displaying data that consists of many independent variables. It is shown as a two-dimensional chart of three or more quantitative variables. These variables are shown on axes starting from the same point.

Visualization:

Radar Chart | Data Plot Types for Visualisation

 

 

Plotly code:

import plotly.express as px
import pandas as pd
df = pd.DataFrame(dict(
    r=[1, 5, 2, 2, 3],
    theta=['processing cost','mechanical properties','chemical stability',
           'thermal stability', 'device integration']))
fig = px.line_polar(df, r='r', theta='theta', line_close=True)
fig.show()

Seaborn code:

import pandas as pd
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
stats=np.array([1, 5, 2, 2, 3])
labels=['processing cost','mechanical properties','chemical stability',
           'thermal stability', 'device integration']
angles=np.linspace(0, 2*np.pi, len(labels), endpoint=False)
fig=plt.figure()
ax = fig.add_subplot(111, polar=True)
ax.plot(angles, stats, 'o-', linewidth=2)
ax.fill(angles, stats, alpha=0.25)
ax.set_thetagrids(angles * 180/np.pi, labels)
ax.set_title("Radar Chart")
ax.grid(True)

These are types of radar charts:

Simple Radar Chart

This is the basic type of radar chart. It consists of several radii drawn from the centre point.

Radar Chart with Markers

In these, each data point on the spider graph is marked.

Filled Radar Chart

In the filled radar charts, the space between the lines and the centre of the spider web is coloured.

Pictogram Graph

It uses icons to give a more engaging overall view of small sets of discrete data. The icons represent the subject or category of the underlying data. For example, population data would use icons of people. Each icon can represent one or many (e.g. a million) units. Side-by-side comparison of data is done in either columns or rows of icons. This is to compare each category to one another.

Here is an illustration:

Pictogram Graph | Data Plot Types for Visualisation

In plotly, marker symbol can be used with graph_objs Scatter. Icons attribute can be used in the figure method of matplotlib. The complete code listing is provided in GitHub.

Spline Chart

A spline chart is a line chart. It connects each data point from the series with a fitted curve that represents a rough approximation of the missing data points.

Visual illustration:

Spline Chart | Data Plot Types for Visualisation

In plotly, it is achieved in line plot by specifying line_shape to be spline. Scipy interpolation and NumPy linspace can be used to achieve this in matplotlib. Again the complete code listing is provided in GitHub.

Box Plot

Box Plot is a good way of looking at how data is distributed. It has a box as the name suggests. One end of the box is at the 25th percentile of the data. 25th percentile is the line drawn where 25% of the data points lie below it. The other end of the box is at the 75th percentile (which is defined similarly to the 25th percentile as above). The median of the data is marked by a line. There are two additional lines which are called whiskers. The 25th percentile mark is termed ‘Q1’ (representing the first quarter of the data). 75th percentile is Q3. The difference between Q3 and Q1 (Q3 – Q1) is IQR (Inter Quartile Range). Whiskers are marked at last data points on either side within the extreme range of Q1 – 1.5 * IQR and Q3 + 1.5 * IQR. The data points outside these whiskers are called ‘outliers’ as they deviate significantly from the rest of the data points.

Plotly code:

import numpy as np 
import plotly.express as px
data = np.array([-40,1,2,5,10,13,15,16,17,40])
fig = px.box(data, points="all")
fig.show()

Visualization:

Box Plot | Data Plot Types for Visualisation

Seaborn code:

import seaborn as sns
sns.set_style( 'darkgrid' )
fig = sns.boxplot(y=data)

Visualization:

Box Plot 2 | Data Plot Types for Visualisation

Box Plot is useful in understanding the overall distribution of data even with large datasets.

Cheat Sheet

Here is a cheat sheet of methods and attributes in plotly and seaborn for generating these plots.

Cheat Sheet | Data Plot Types for Visualisation

Conclusion

We looked at a variety of plots. We saw when to use each one of them. We looked at code in plotly and seaborn for generating these plots. We went over visualizations of these plots for better understanding. A reference cheat sheet is provided on which methods and attributes to be used in plotly and seaborn for generating these plots.

Now that you are equipped with these tools, techniques, and tips. Hope you’re now well equipped with Data Plot Types for the Visualisation concept. Try this out and have fun!

Want to read another article on data visualization? Click here.

The media shown in this article is not owned by Analytics Vidhya and are used at the Author’s discretion. 

About the Author

Sameer Mahajan

Our Top Authors

Download Analytics Vidhya App for the Latest blog/Article

One thought on "12 Data Plot Types for Visualisation from Concept to Code"

12 Data Plot Types for Visualisation from Concept to Code - IT Skills You Need
12 Data Plot Types for Visualisation from Concept to Code - IT Skills You Need says: December 21, 2021 at 4:48 pm
[…] post 12 Data Plot Types for Visualisation from Concept to Code appeared first on Analytics […] Reply

Leave a Reply Your email address will not be published. Required fields are marked *