Siddharth M — Published On August 8, 2021 and Last Modified On April 26th, 2023

## Introduction

We all love exploring data. Data scientists’ major work is to represent data and interpret or extract important information, which is called exploratory data analysis. There are many different representations to show data. One of the important diagrams is a Bar Plot which is widely used in many applications and presentations. This tutorial will teach us to implement and understand a bar plot in Python.

Learning Objectives

• In this tutorial, you will learn about Bar plots, their types, and their uses.
• You will also learn when to use Barplots and when to use Histograms
• Lastly, you will learn different methods to create barplots in python.

## What Are Bar Plots?

A bar graph is a graphical representation of data in which we can highlight the category with particular shapes like a rectangle. The length and heights of the bar chart represent the data distributed in the dataset. In a bar chart, we have one axis representing a particular category of a column in the dataset and another axis representing the values or counts associated with it.  Bar charts can be plotted vertically or horizontally. A vertical bar chart is often called a column chart. When we arrange bar charts in a high to low-value counts manner, we called them Pareto charts.

Source: medium.com

## Barplot vs Histograms: Which Plot to Use?

Histograms are used to represent the distribution of continuous data. They are a graphical representation of a set of continuous or discrete data frequency distributions. By plotting the data into bins or intervals, a histogram allows us to easily visualize the number of data points within each bin, giving us a sense of the distribution of the data. Some common uses of histograms include:

1. Exploring the distribution of a single variable: A histogram can help you understand the distribution of a single variable, such as height, weight, or income.
2. Comparing two or more groups: By plotting histograms for different groups, you can compare the distributions of two or more variables. This can help you identify differences and similarities between groups.
3. Detecting outliers: Histograms can help you detect outliers or data points significantly different from the rest of the data.
4. Estimating probability density: By normalizing a histogram, you can estimate the probability density function, which describes the probability of observing a data point within a given interval.

Summing up, histograms group the data into bins, which are intervals of values, and display the frequency of data points within each bin as a bar. The bars in a histogram are drawn such that they touch each other, creating a continuous representation of the data distribution. Since the bar plots have “bars,” this is the biggest source of confusion: which plot to use to plot the distribution of what type of data/variable?

A bar plot, on the other hand, is used to represent categorical data, which is data that can be divided into distinct categories. In a bar plot, each category is represented by a separate bar, and the height of the bar represents the frequency or count of data points in that category.

So, the choice between a histogram and a bar plot depends on the type of data you are working with. If you have continuous data, a histogram is an appropriate choice. If you have categorical data, a bar plot is an appropriate choice. Additionally, if you have ordinal data, which is data that can be ordered, such as star ratings or levels of education, you may choose to use a bar plot.

## Common Use Cases for Bar Plots

Here are some common use cases for bar plots:

1. Comparing frequencies or counts: Bar plots can be used to compare the frequency or count of data points in different categories. For example, you could use a bar plot to compare the number of books sold in different genres.
2. Displaying proportions: Bar plots can be used to display proportions, such as the percentage of respondents who selected each option in a survey.
3. Visualizing changes over time: Bar plots can be used to display changes in categorical data over time, such as the number of sales in different months or the number of new customers in different years.
4. Comparing multiple variables: By grouping bar plots, you can compare multiple variables simultaneously. For example, you could compare the number of books sold by different authors and by different genres.
5. Displaying nominal data: Bar plots are a common way to display nominal data, which is data that has no inherent order or structure, such as hair color or preferred drink.

## How to Create a Bar Graph or Bar Plot in Python?

In python, we use some libraries to create bar plots. They are very useful for data visualizations and interpreting meaningful information from datasets.

Here are some Python libraries we use to create a bar chart.

#### Creating a Bar Plot in Python Using Matplotlib

Matplotlib is a maths library widely used for data exploration and visualization. It is simple and provides us with the API to access functions like the ones used in MATLAB. The Matplotlib bar() function is the easiest way to create a bar chart. We import the library as plt and use:

`plt.bar(x, height, width, bottom, align)`

The code to create a bar plot in matplotlib:

The bar width in bar charts can be controlled or specified using the “width” parameter in the bar() function of the Matplotlib library. The “width” parameter determines the width of each bar in the bar chart. For example, to set the bar width to 0.8, you can write the following code:

You can also use the np.arange() function or the np.linspace() function to create numpy arrays, which can be plotted. You can also use the plt.subplots() function to create multiple plots in the same Python figure.

Reference: https://matplotlib.org/

#### Creating a Bar Plot in Python Using Seaborn

Seaborn is also a visualization library based on matplotlib and is widely used for presenting data. We can import the library as sns and use the following syntax:

`seaborn.barplot(x=' ', y=' ',data=df)`

The code to create a bar chart in seaborn:

```import seaborn as sns
import matplotlib.pyplot as plt
sns.barplot(x = 'time',y = 'total_bill',data = df)
plt.show()```

Source: seaborn.pydata.org

#### Creating a Bar Plot in Python Using Plotly

Plotly is an amazing visualization library that has the ability to interactive presentations, zoom into locations, and handle various design representations. It is widely used for readable representations, and we can hover over the chart to read the data represented. It is also used for higher-dimensional data representation and has abstraction to represent data science and Machine learning visualizations. We use plotly.express as px for importing plotly.

`px.bar(df, x=' ', y=' ')`

The following is code to create a bar chart in Plotly:

```import plotly.express as px
fig.show()```

Source: plotly.com

## Types of Bar Plots in Python

#### Unstacked Bar Plots

Unstacked bar plots are used to compare a particular category over time with different samples. It can be used to deduct some facts from the pattern we observe through the comparison. In the figure below, we can see the players’ ratings over the years in FIFA. We can see that Django and Gafur have increased in ratings over the years. This shows us their progression, so a club can now decide if they want to sign Django or Gafur.

```import pandas as pd

plotdata = pd.DataFrame({

"2018":[57,67,77,83],

"2019":[68,73,80,79],

"2020":[73,78,80,85]},

index=["Django", "Gafur", "Tommy", "Ronnie"])

plotdata.plot(kind="bar",figsize=(15, 8))

plt.title("FIFA ratings")

plt.xlabel("Footballer")

plt.ylabel("Ratings")```

You can also use the Pandas read_csv() function to import data in a CSV file format into a Pandas dataframe for plotting.

#### Stacked Bar Plots

As the name suggests, stacked bar charts/plots have each plot stacked one over them. As we saw earlier that we used an unstacked bar chart to compare each group; we can use a stacked plot to compare each individual. In pandas, this is easy to implement using the stacked keyword.

```import pandas as pd
plotdata = pd.DataFrame({
"2018":[57,67,77,83],
"2019":[68,73,80,79],
"2020":[73,78,80,85]},
index=["Django", "Gafur", "Tommy", "Ronnie"])
plotdata.plot(kind='bar', stacked=True,figsize=(15, 8))```
```plt.title("FIFA ratings")

plt.xlabel("Footballer")

plt.ylabel("Ratings")```

## Analysis of Bar Plots in Python

Now, let us apply these syntaxes to a dataset and see how we can plot bar charts using different libraries. For this, we will use the Summer Olympics Medal 1976- 2008 dataset and visualize it using bar graphs to generate univariate, bivariate, and multivariate analysis and interpret relevant information from it.

The Summer Olympics dataset from 1976 to 2008 is available here.

#### Univariate Analysis

In exploratory data analysis, Univariate analysis refers to visualizing one variable. In our case, we want to visualize column data using a bar plot.

All-time medals of top 10 countries:

```top_10 = df['Country'].value_counts()[:10]
top_10.plot(kind='bar',figsize=(10,8))
plt.title('All Time Medals of top 10 countries')```

The graph shows the top 10 countries that have won Olympic medals. The USA has dominated in olymics over the years.

Medals won by the USA in Summer Olympics:

```indpie = df[df['Country']=='United States']['Medal'].value_counts()

indpie.plot(kind='bar',figsize=(10,8))```

We filter the country to the USA and visualize the medals won by the USA alone.

#### Bivariate Analysis

The bivariate analysis includes two variables or two columns from our dataset.

Total athletes’ contribution to Summer Olympics over time:
plt.figure(figsize=(10, 5))

```sns.countplot(df['Year'])
plt.title('Total Athletes contribution in summer olympics over time')
plt.xlabel('Years')
plt.ylabel('No. of Athlete')```

Over the years there has been an increase in the participation of athletes in the Olympics.

Top 10 athletes with the most awarded medals:

```athlete_order = df['Athlete'].value_counts().head(10).index
plt.figure(figsize=(9, 5))
sns.countplot(data=df, y='Athlete', order=athlete_order)
plt.title('Top 10 Athletes with the most awarded Medals')
plt.xlabel('No. of awrded medals')
plt.ylabel('Athlete Name');```

This plot is also called the horizontal bar chart, and here we can see Micheal has won the most medals in the Olympics. This bar graph has the top 10 athletes.

Sports with most awarded medals:

```plt.figure(figsize=(15, 5))
highest_sport = df['Sport'].value_counts().index
sns.countplot(data=df, x='Sport', order=highest_sport)
plt.xticks(rotation=75)
plt.title('Sports with most awarded Medals')
plt.xlabel('Sport')
plt.ylabel('No. of Medals')```

Aquatics has contributed to the most number of medals in the Olympics. One thing to note in this graph is that we have used a rotation of 75 to text.

Type of medals won over the years:

```sns.countplot(x='Year',hue='Medal',data=df)
sns.set(rc={'figure.figsize':(10,10)})
plt.title("Type of medals won over the years")```

The graph is an unstacked barplot and shows medal grouping over each year.

Medals by gender:

`sns.countplot(x="Medal", hue="Gender", data=df)`

The gender bar plot tells us that men have participated more in the Olympics, or we can see men category games have been more in the Olympics.

The gender ratio in Summer Olympics:

```gender_group = df.groupby(['Year', 'Gender']).size().unstack()

gender_group.apply(lambda x:x/x.sum(), axis=1).plot(kind='barh', stacked=True, legend=False)

plt.legend(['Men', 'Women'], bbox_to_anchor=(1.0, 0.7))

plt.xlabel('Men / Women ratio')```

The data tells us the ratio of men to women in the Olympics over the years. Here, we can see that more games have started including the women’s category, which is a great sign.

Medals by gender in each discipline:

```sns.countplot(y='Discipline',hue='Gender',data=df)
sns.set(rc={'figure.figsize':(10,10)})
plt.xticks(rotation=90)
plt.title('Medals by Gender in each Discipline')
plt.legend(loc=1)  # 1 is code for 'upper right'3```

This graph shows each gender’s participation in the specific discipline.

The scatter plot is another type of plot for bivariate data visualization of numerical data where the x-axis and y-axis represent the values of two different data points.

#### Multi-Variate Analysis

Multivariate analysis is used when we want to compare more than two categories. Usually, a boxplot is a good representation, as shown here.

`sns.catplot(x="Medal", y="Year", hue="Gender",kind="box", data=df)`

This graph shows us that in all three medals, men have been winning moreover women in the Olympics over the year.

## Bar Plots for Time Series Data

Bar plots can be used to plot time series data by using the x-axis to represent time and the y-axis to represent the values of the data points. In this case, each bar in the bar plot represents a single data point, with the height of the bar representing the value of the data point and the bar’s position along the x-axis representing the time at which the data point was recorded.

To plot time series data as a bar plot in Python, you can use the bar() function of the Matplotlib library. First, you need to convert the time series data into a suitable format, such as a list or numpy array, that can be passed to the bar() function. Then, you can use the “xticks” parameter in the bar() function to specify the x-axis labels, representing the time values in the time series data. If you want, you can also customize the y-axis using the y-ticks params.

For example, to plot a time series data with time values in the format “YYYY-MM-DD” and data values as integers, you can write the following Python code:

``````import matplotlib.pyplot as plt
import pandas as pd

# Example time series data
time = ['2020-01-01', '2020-01-02', '2020-01-03', '2020-01-04']
data = [100, 120, 130, 140]

# Create a pandas dataframe from the time series data
df = pd.DataFrame({'Time': time, 'Data': data})

# Plot the time series data as a bar plot
plt.bar(df['Time'], df['Data'], width=0.8)
plt.xticks(rotation=90)
plt.show()``````

The above code will create a bar plot with the time values along the x-axis and the data values along the y-axis. You can customize the appearance of the plot by adjusting various parameters, such as the width of the bars, the color of the bars, and the labels for the x and y axes.

## Conclusion

In this article, we have gone through different implementations of bar plots in Python and understand the different types of plots we use for exploratory data analysis using bar graphs.When creating a bar plot in Python, it is important to choose an appropriate type based on the data’s nature and the information you want to communicate.

Source: gitmind.com

Key Takeaways

• Bar plots are a type of data visualization used to represent data in the form of rectangular bars.The height of each bar represents the value of a data point, and the width of each bar represents the category of the data.
• The Matplotlib library in Python is widely used to create bar plots. The bar() function in Matplotlib is used to create bar plots; and accepts data in the form of lists, numpy arrays, and pandas dataframes.
• There are several types of bar plots, including simple bar plots, grouped bar plots, stacked bar plots, horizontal bar plots, and error bar plots.

Take your data visualization skills to the next level and gain a comprehensive understanding of data science techniques, consider enrolling in our Data Science Black Belt program. Our program offers hands-on experience and personalized mentorship to equip you with the skills and knowledge needed to succeed in the fast-paced world of data science. Enroll today and take the first step towards becoming a data science expert!

Q1. How do you graph a bar graph in python?

A. We can graph a bar graph in python using the Matplotlib library’s “bar()” function.

Q2. What are the different types of bar plots in python?

A. Some of the most common types of bar plots in Python are:

1. Simple bar plot: A bar plot representing a single data set, where each bar represents a single data point.
2. Grouped bar plot: A bar plot representing multiple sets of data, where each group represents a separate data set.
3. Stacked bar plot: A bar plot representing multiple sets of data, where the height of each bar is the sum of the values for each data set.
4. Horizontal bar plot: A bar plot that is rotated 90 degrees to the left, where the x-axis is the vertical axis, and the y-axis is the horizontal axis.
Error bar plot: A bar plot that includes error bars representing the data’s uncertainty.

Q3. Which Python object can be used for data in bar graph?

A. Commonly used Python objects for plotting a bar graph are lists, Numpy arrays, and Pandas dataframes.

Q4. What is matplotlib.pyplot.bar?

A. matplotlib.pyplot.bar is a Python function that creates a vertical bar chart to display data. This function is part of the matplotlib library and is typically used to visualize categorical data. The bar function takes in parameters such as the x-coordinates of the bars, the height of the bars, and the width of the bars, and can be customized with various optional parameters such as colors, labels, and titles. Overall, matplotlib.pyplot.bar is a useful tool for creating clear and informative visualizations of data.