Gunjan Agarwal — October 17, 2021
Beginner Data Visualization Libraries Python

This article was published as a part of the Data Science Blogathon

Introduction to Matplotlib

If you are interested in data analytics or data visualization you are at the right place to get started. So let’s start with the simple introduction about, Data visualization which is the process of translating the numbers, text, or large data sets into various types of graphs such as histograms, maps, bar plots, pie charts, etc. For visualizations, we need some tools or technology. Matplotlib is one of the most powerful libraries in python for data visualization. In this article, I will explain to you how you can visualize different types of graphs and charts to explain your data to someone very easily.

In this article we would be discussing the following:

  1. Installation
  2. Important Types of Plots

Let’s start with a small introduction to Matplotlib. Matplotlib is the basic visualizing or plotting library of the python programming language. Matplotlib is a powerful tool for executing a variety of tasks. It is able to create different types of visualization reports like line plots, scatter plots, histograms, bar charts, pie charts, box plots, and many more different plots. This library also supports 3-dimensional plotting.

 

Installation of Matplotlib

Let’s check how to set up the Matplotlib in Google-Colab. Colab Notebooks are similar to Jupyter Notebooks except for the fact that they run on the cloud. It is also connected with our Google Drive so it makes it much easier to access our Colab notebooks any time, anywhere, and on any system. You can install Matplotlib by using the PIP command.

!pip install matplotlib
Using pip to install Matplotlib
Source: Local

To verify the installation you would have to write the following code chunk:

import matplotlib 
print(matplotlib.__version__)
Printing the version of Matplotlib
Source: Local

 

Important types of plots in Matplotlib

Now that you know what is Matplotlib and how you can install it in your system, let’s discuss different kinds of plots that you can draw to analyze your data or to present your findings.

Sub Plots-

Subplots() is a Matplotlib function that is used to display multiple plots in one figure. It takes various arguments such as a number of rows, columns, or sharex, sharey axis.

Code:

# First create a grid of plots
fig, ax = plt.subplots(2,2,figsize=(10,6)) #this will create the subplots with 2 rows and 2 columns 
#and the second argument is size of the plot 
# Lets plot all the figures 
ax[0][0].plot(x1, np.sin(x1), 'g') #row=0,col=0 
ax[0][1].plot(x1, np.cos(x1), 'y') #row=0,col=1
 
ax[1][0].plot(x1, np.sin(x1), 'b') #row=1,col=0
ax[1][1].plot(x1, np.cos(x1), 'red') #row=1,col=1
plt.tight_layout() 
#show the plots
plt.show()
Subplots in matplotlib
Source: Local

Now let’s check different categories of plots that Matplotlib provides.

  • Line plot
  • Histogram
  • Bar Chart
  • Scatter plot
  • Pie charts
  • Boxplot

Most of the time we have to work with the Pyplot as an interface of Matplotlib. So, we import Pyplot like this:

import matplotlib.pyplot

To make things easier, we can import it like this:

import matplotlib.pyplot as plt

Line Plots-

A line plot is used to see the relationship between the x and y-axis.

The plot() function in the Matplotlib library’s Pyplot module is used to create a 2D hexagonal plot of the coordinates x and y. plot() will take various arguments like plot(x, y, scalex, scaley, data, **kwargs).

x, y are the coordinates of the horizontal and vertical axis where x values are optional and its default value is range(len(y)).

scalex, scaley parameters are used to autoscale the x-axis or y-axis, and its default value is true.

**kwargs is used to specify the property like line label, linewidth, marker, color, etc.

Code:

#this line will create array of numbers between 1 to 10 of length 100 
#np.linspace(satrt,stop,num) 
x1 = np.linspace(0, 10, 100)  #line plot 
plt.plot(x1, np.sin(x1), '-',color='orange') 
plt.plot(x1, np.cos(x1), '--',color='b')
#give the name of the x and y  axis 
plt.xlabel('x label')
plt.ylabel('y label') 
#also give the title of the plot 
plt.title("Title") 
plt.show() 
Lineplot in matplotlib
Source: Local

 

Histogram-

The most common graph for displaying frequency distributions is a histogram. To create a histogram the first step is to create a bin of ranges, then distribute the whole range of value into series of intervals, and count the value which will fall in the given interval. we can use plt.hist() function for plotting the histograms which will take various arguments like data, bins, color, etc.

x: x-coordinate or sequence of the array

bins: integer value for the number of bins wanted in the graph

range: the lower and upper range of bins

density: optional parameter that contains boolean values

histtype: optional parameter used to create different types of histograms like:-bar, bar stacked, step, step filled and the default is a bar

Code:

#draw random samples from random distributions.
x = np.random.normal(170, 10, 250)
#plot histograms
plt.hist(x) plt.show()
Histogram in matplotlib
Source: Local

 

Bar Plot-

Mainly barplot is used to show the relationship between the numeric and categoric values. In a bar chart, we have one axis representing a particular category of the columns and another axis representing the values or count of the specific category. Barcharts are plotted both vertically and horizontally and are plotted using the following line of code:

plt.bar(x,height,width,bottom,align)

x: representing the coordinates of the x-axis

height: the height of the bars

width: width of the bars. Its default value is 0.8

bottom: It’s optional. It is a y-coordinate of the bar its default value is None

align: center, edge its default value is center

Code:

#define array  
data= [5. , 25. , 50. , 20.] 
plt.bar(range(len(data)), data,color='c') 
plt.show()
Barplot in Matplotlib
Source: Local

 

Scatter Plot-

Scatter plots are used to show the relationships between the variables and use the dots for the plotting or it used to show the relationship between two numeric variables.

The scatter() method in the Matplotlib library is used for plotting.

Code:

#create the x and y axis coordinates
x = np.array([5,7,8,7,2,17,2,9,4,11,12,9,6]) 
y = np.array([99,86,87,88,111,86,103,87,94,78,77,85,86]) 
plt.scatter(x, y)
plt.legend() 
plt.show()
Scatter Plot in Matplotlib
Source: Local

 

Pie Chart-

A pie chart (or circular chart ) is used to show the percentage of the whole. Hence it is used when we want to compare the individual categories with the whole. Pie() will take the different parameters such as:

x: Sequence of an array

labels: List of strings which will be the name of each slice in the pie chart

Autopct: It is used to label the wedges with numeric values. The labels will be placed inside the wedges. Its format is %1.2f%

Code:

#define the figure size
plt.figure(figsize=(7,7))
x = [25,30,45,10] 
#labels of the pie chart
labels = ['A','B','C','D'] 
plt.pie(x, labels=labels)
plt.show()
Pie Chart in matplotlib
Source: Local

 

Box Plot-

A Box plot is used to show the summary of the whole dataset or all the numeric values in the dataset. The summary contains minimum, first quartile, median, third quartile, and maximum. Also, the median is present between the first and third quartile. Here x-axis contains the data values and y coordinates show the frequency distribution.

Parameters used in box plots are as follows:

data: NumPy array

vert: It will take boolean values i.e true or false for the vertical and horizontal plot default is True

width: This will take array and sets of the width of boxes, Optional parameters

Patch_artist: It is used to fill the boxes with color and its default value is false

labels: Array of strings which is used to set the labels of the dataset

Code:

#create the random values by using numpy 
values= np.random.normal(100, 20, 300) 
#creating the plot by boxplot() function which is avilable in matplotlib
plt.boxplot(values,patch_artist=True,vert=True) 
plt.show()
Boxplot in matplotlib
Source: Local

Area Chart-

Area chart or area plot is used to visualize the quantitative data graphically it is based on the line plot. fill_between() function is used to plot the area chart.

Parameter:

x,y represent the x and y coordinates of the plot. This will take an array of length n.

Interpolate is a boolean value and is optional. If true, interpolate between the two lines to find the precise point of intersection.

**kwargs: alpha, color, facecolor, edgecolur, linewidth.

Code:

import numpy as np 
import pandas as pd 
import matplotlib.pyplot as plt 
y = [2, 7, 14, 17, 20, 27, 30, 38, 25, 18, 6, 1]
#plot the line of the given data
plt.plot(np.arange(12),y, color="blue", alpha=0.6, linewidth=2) 
#decorate thw plot by giving the labels 
plt.xlabel('Month', size=12) 
plt.ylabel('Turnover(Cr.)', size=12) #set y axis start with zero 
plt.ylim(bottom=0)
plt.show()
Area Chart
Source: Local

Fill the area in a line plot by using fill_between() for the area chart.

plt.fill_between(np.arange(12), turnover, color="teal", alpha=0.4)
Filling the area in a line plot by using fill_between()
Source: Local

Word Cloud-

Wordcloud is the visual representation of the text data. Words are usually single, and the importance of each word is shown with the font size or color. The wordcloud() function is used to create word cloud in python.

The wordcloud() will take various arguments like:

width: set the width of the canvas .default 400

height: set the height of the canvas .default 400

max_words: number of words allowed, its default value is 200.

background_color: background color for the word-cloud image.The default color is black.

Once the word cloud object is created, you can call the generate function to generate the word cloud and pass the text data.

Code:

#import the libraries
from wordcloud import WordCloud 
import matplotlib.pyplot as plt 
from PIL import Image 
import numpy as np
#set the figure size . 
plt.figure(figsize=(10,15)) 
#dummy text.
text = '''Nulla laoreet bibendum purus, vitae sollicitudin sapien facilisis at.
          Donec erat diam, faucibus pulvinar eleifend vitae, vulputate quis ipsum.
          Maecenas luctus odio turpis, nec dignissim dolor aliquet id.
          Mauris eu semper risus, ut volutpat mi. Vivamus ut pellentesque sapien.
          Etiam fringilla tincidunt lectus sed interdum. Etiam vel dignissim erat.
          Curabitur placerat massa nisl, quis tristique ante mattis vitae.
          Ut volutpat augue non semper finibus. Nullam commodo dolor sit amet purus auctor mattis.
          Ut id nulla quis purus tempus porttitor. Ut venenatis sollicitudin est eget gravida.
          Duis imperdiet ut nisl cursus ultrices. Maecenas dapibus eu odio id hendrerit.
          Quisque eu velit hendrerit, commodo magna euismod, luctus nunc.
          Proin vel augue cursus, placerat urna aliquet, consequat nisl.
          Duis vulputate turpis a faucibus porta. Etiam blandit tortor vitae dui vestibulum viverra.
          Phasellus at porta elit. Duis vel ligula consectetur, pulvinar nisl vel, lobortis ex.'''
wordcloud = WordCloud( margin=0,colormap='BuPu').generate(text) 
#imshow() function in pyplot module of matplotlib library is used to display data as an image.
plt.imshow(wordcloud, interpolation='bilinear') 
plt.axis('off') 
plt.margins(x=0, y=0) 
plt.show()
Word Cloud
Source: Local

3-D Graphs-

Now that you have seen some simple graphs it’s time to check some complex ones i.e 3-D graphs. Initially, Matplotlib was built for 2-dimensional graphs but later on, 3-D graphs were added to it. Let’s check how you can plot a 3-D graph in Matplotlib.

Code:

from mpl_toolkits import mplot3d
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
fig = plt.figure()
ax = plt.axes(projection='3d')

The above code is used to create the 3-Dimensional axes.

3-D Graph in Matplotlib
Source: Local

Each and every plot that we have seen in 2-D plotting through Matplotlib can be drawn as 3-D graphs also. For instance, let’s check a line plot in a 3-D plane.

ax = plt.axes(projection='3d')

# Data for a three-dimensional line
zline = np.linspace(0, 15, 1000)
xline = np.sin(zline)
yline = np.cos(zline)
ax.plot3D(xline, yline, zline, 'gray')

# Data for three-dimensional scattered points
zdata = 15 * np.random.random(100)
xdata = np.sin(zdata) + 0.1 * np.random.randn(100)
ydata = np.cos(zdata) + 0.1 * np.random.randn(100)
ax.scatter3D(xdata, ydata, zdata, c=zdata, cmap='Greens');
3D Scatter plot
Source: Local

all other types of graphs can be drawn in the same way. One special graph that Matplotlib 3-D provides is Contour Plot. You can draw a contour plot using the following link:

fig = plt.figure()
ax = plt.axes(projection='3d')
ax.plot_wireframe(X, Y, Z, color='black')
ax.set_title('wireframe');
Contour Plot in Matplotlib
Source: Local

To understand all the mentioned plot types you can refer the following – https://www.youtube.com/watch?v=yZTBMMdPOww

Conclusion

In this article, we have discussed Matplotlib i.e the basic plotting library in python, and the basic information about the different types of charts for the statistical analysis which is commonly used. Also, we have discussed how to draw multiple plots in one figure by using the subplot function.

Also here we discussed how to customize the figure or how to resize it and also how to decorate the plots by using the various arguments. Since now you know the basics of plotting and charting you can try plotting different datasets and mathematical functions.

As Data Professionals (including Data Analysts, Data Scientists, ML Engineers, DL Engineers) at some point all of them need to visualize data and present findings so what would be a better option than this, and now that you know this tech you would be a bit confident in the industry.

Thanks for reading the article, please share the article if you found it interesting!

The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.

About the Author

Gunjan Agarwal

Our Top Authors

Download Analytics Vidhya App for the Latest blog/Article

Leave a Reply Your email address will not be published. Required fields are marked *