We all love exploring data. Data scientists’ major work is to represent data and interpret or extract important information, which is called exploratory data analysis. There are many different representations to show data. One of the important diagrams is a Bar Plot which is widely used in many applications and presentations. This tutorial will teach us to implement and understand a bar plot in Python.Also, We are providing informati regarding bar graph python , how to implementing it and how bar plot in matplotlib is works. So with this tutorial you will clear all your thoughts bar plot in matplotlib or any query regarding python bar chart.
In this article, you will Create a Python bar plot using Matplotlib, which is straightforward. With the plt.bar
function, you can easily generate a Python matplotlib bar chart to visualize your data effectively.
Learning Objectives
A bar graph, a graphical representation of data, employs rectangles to emphasize specific categories. The length and height of the bars depict the dataset distribution. One axis represents a category, while the other represents values or counts. Bar plot in Python, commonly used for this purpose, enable visualizing data either vertically or horizontally. The vertical version is often termed a column chart. When organized from high to low counts, these bar charts are referred to as Pareto charts, providing a clear insight into the significance of different categories
Histograms, a valuable data representation tool, vividly illustrate continuous data distribution. They serve as graphical depictions of frequency distributions for both continuous and discrete datasets. Through the allocation of data into bins or intervals, histograms provide a natural means to visualize the count of data points within each bin, offering insights into data distribution patterns. Some common applications of histograms include understanding data variability, identifying outliers, and assessing the overall shape of the dataset. Exploring histograms and their applications can be seamlessly achieved through techniques like creating a “bar plot in Python,” enhancing the natural representation of data distribution. Some common uses of histograms include:
Summing up, histograms group the data into bins, which are intervals of values, and display the frequency of data points within each bin as a bar. The bars in a histogram are drawn such that they touch each other, creating a continuous representation of the data distribution. Since the bar plots have “bars,” this is the biggest source of confusion: which plot to use to plot the distribution of what type of data/variable?
A bar plot, on the other hand, is used to represent categorical data, which is data that can be divided into distinct categories. In a bar plot, each category is represented by a separate bar, and the height of the bar represents the frequency or count of data points in that category.
So, the choice between a histogram and a bar plot in Python depends on the type of data you are working with. If you have continuous data, a histogram is an appropriate choice. If you have categorical data, a bar plot is an appropriate choice. Additionally, if you have ordinal data, which is data that can be ordered, such as star ratings or levels of education, you may choose to use a bar plot.
Here are some common use cases for bar plots:
In python, we use some libraries to create bar plots. They are very useful for data visualizations and interpreting meaningful information from datasets.
Code: https://colab.research.google.com/drive/1YkyaoUNNXZVw_MYgNK4Wx2XlM2Uwap2k?usp=sharing
Here are some Python libraries we use to create a bar chart.
Matplotlib is a maths library widely used for data exploration and visualization. It is simple and provides us with the API to access functions like the ones used in MATLAB. The Matplotlib bar() function is the easiest way to create a bar chart. We import the library as plt and use:
plt.bar(x, height, width, bottom, align)
The code to create a bar plot in matplotlib:
The bar width in bar charts can be controlled or specified using the “width” parameter in the bar() function of the Matplotlib library. The “width” parameter determines the width of each bar in the bar chart. For example, to set the bar width to 0.8, you can write the following code:
You can also use the np.arange() function or the np.linspace() function to create numpy arrays, which can be plotted. You can also use the plt.subplots() function to create multiple plots in the same Python figure.
Reference: https://matplotlib.org/
Seaborn is also a visualization library based on matplotlib and is widely used for presenting data. We can import the library as sns and use the following syntax:
seaborn.barplot(x=' ', y=' ',data=df)
The code to create a bar chart in seaborn:
import seaborn as sns
import matplotlib.pyplot as plt
df = sns.load_dataset('tips')
sns.barplot(x = 'time',y = 'total_bill',data = df)
plt.show()
Plotly is an amazing visualization library that has the ability to interactive presentations, zoom into locations, and handle various design representations. It is widely used for readable representations, and we can hover over the chart to read the data represented. It is also used for higher-dimensional data representation and has abstraction to represent data science and Machine learning visualizations. We use plotly.express as px for importing plotly.
px.bar(df, x=' ', y=' ')
The following is code to create a bar chart in Plotly:
import plotly.express as px
data_canada = px.data.gapminder().query("country == 'Canada'")
fig = px.bar(data_canada, x='year', y='pop')
fig.show()
Unstacked bar plots are used to compare a particular category over time with different samples. It can be used to deduct some facts from the pattern we observe through the comparison. In the figure below, we can see the players’ ratings over the years in FIFA. We can see that Django and Gafur have increased in ratings over the years. This shows us their progression, so a club can now decide if they want to sign Django or Gafur.
import pandas as pd
plotdata = pd.DataFrame({
"2018":[57,67,77,83],
"2019":[68,73,80,79],
"2020":[73,78,80,85]},
index=["Django", "Gafur", "Tommy", "Ronnie"])
plotdata.plot(kind="bar",figsize=(15, 8))
plt.title("FIFA ratings")
plt.xlabel("Footballer")
plt.ylabel("Ratings")
You can also use the Pandas read_csv() function to import data in a CSV file format into a Pandas dataframe for plotting.
As the name suggests, stacked bar charts/plots have each plot stacked one over them. As we saw earlier that we used an unstacked bar chart to compare each group; we can use a stacked plot to compare each individual. In pandas, this is easy to implement using the stacked keyword.
import pandas as pd
plotdata = pd.DataFrame({
"2018":[57,67,77,83],
"2019":[68,73,80,79],
"2020":[73,78,80,85]},
index=["Django", "Gafur", "Tommy", "Ronnie"])
plotdata.plot(kind='bar', stacked=True,figsize=(15, 8))
plt.title("FIFA ratings")
plt.xlabel("Footballer")
plt.ylabel("Ratings")
Now, let us apply these syntaxes to a dataset and see how we can plot bar charts using different libraries. For this, we will use the Summer Olympics Medal 1976- 2008 dataset and visualize it using bar graphs to generate univariate, bivariate, and multivariate analysis and interpret relevant information from it.
The Summer Olympics dataset from 1976 to 2008 is available here.
In exploratory data analysis, Univariate analysis refers to visualizing one variable. In our case, we want to visualize column data using a bar plot.
All-time medals of top 10 countries:
top_10 = df['Country'].value_counts()[:10]
top_10.plot(kind='bar',figsize=(10,8))
plt.title('All Time Medals of top 10 countries')
The graph shows the top 10 countries that have won Olympic medals. The USA has dominated in olymics over the years.
Medals won by the USA in Summer Olympics:
indpie = df[df['Country']=='United States']['Medal'].value_counts()
indpie.plot(kind='bar',figsize=(10,8))
We filter the country to the USA and visualize the medals won by the USA alone.
The bivariate analysis includes two variables or two columns from our dataset.
Total athletes’ contribution to Summer Olympics over time:
plt.figure(figsize=(10, 5))
sns.countplot(df['Year'])
plt.title('Total Athletes contribution in summer olympics over time')
plt.xlabel('Years')
plt.ylabel('No. of Athlete')
Over the years there has been an increase in the participation of athletes in the Olympics.
Top 10 athletes with the most awarded medals:
athlete_order = df['Athlete'].value_counts().head(10).index
plt.figure(figsize=(9, 5))
sns.countplot(data=df, y='Athlete', order=athlete_order)
plt.title('Top 10 Athletes with the most awarded Medals')
plt.xlabel('No. of awrded medals')
plt.ylabel('Athlete Name');
This plot is also called the horizontal bar chart, and here we can see Micheal has won the most medals in the Olympics. This bar graph has the top 10 athletes.
Sports with most awarded medals:
plt.figure(figsize=(15, 5))
highest_sport = df['Sport'].value_counts().index
sns.countplot(data=df, x='Sport', order=highest_sport)
plt.xticks(rotation=75)
plt.title('Sports with most awarded Medals')
plt.xlabel('Sport')
plt.ylabel('No. of Medals')
Aquatics has contributed to the most number of medals in the Olympics. One thing to note in this graph is that we have used a rotation of 75 to text.
Type of medals won over the years:
sns.countplot(x='Year',hue='Medal',data=df)
sns.set(rc={'figure.figsize':(10,10)})
plt.title("Type of medals won over the years")
The graph is an unstacked barplot and shows medal grouping over each year.
Medals by gender:
sns.countplot(x="Medal", hue="Gender", data=df)
The gender bar plot tells us that men have participated more in the Olympics, or we can see men category games have been more in the Olympics.
The gender ratio in Summer Olympics:
gender_group = df.groupby(['Year', 'Gender']).size().unstack()
gender_group.apply(lambda x:x/x.sum(), axis=1).plot(kind='barh', stacked=True, legend=False)
plt.legend(['Men', 'Women'], bbox_to_anchor=(1.0, 0.7))
plt.xlabel('Men / Women ratio')
The data tells us the ratio of men to women in the Olympics over the years. Here, we can see that more games have started including the women’s category, which is a great sign.
Medals by gender in each discipline:
sns.countplot(y='Discipline',hue='Gender',data=df)
sns.set(rc={'figure.figsize':(10,10)})
plt.xticks(rotation=90)
plt.title('Medals by Gender in each Discipline')
plt.legend(loc=1) # 1 is code for 'upper right'3
This graph shows each gender’s participation in the specific discipline.
The scatter plot is another type of plot for bivariate data visualization of numerical data where the x-axis and y-axis represent the values of two different data points.
Multivariate analysis is used when we want to compare more than two categories. Usually, a boxplot is a good representation, as shown here.
sns.catplot(x="Medal", y="Year", hue="Gender",kind="box", data=df)
This graph shows us that in all three medals, men have been winning moreover women in the Olympics over the year.
Bar plots can be used to plot time series data by using the x-axis to represent time and the y-axis to represent the values of the data points. In this case, each bar in the bar plot represents a single data point, with the height of the bar representing the value of the data point and the bar’s position along the x-axis representing the time at which the data point was recorded.
To plot time series data as a bar plot in Python, you can use the bar() function of the Matplotlib library. First, you need to convert the time series data into a suitable format, such as a list or numpy array, that can be passed to the bar() function. Then, you can use the “xticks” parameter in the bar() function to specify the x-axis labels, representing the time values in the time series data. If you want, you can also customize the y-axis using the y-ticks params.
For example, to plot a time series data with time values in the format “YYYY-MM-DD” and data values as integers, you can write the following Python code:
import matplotlib.pyplot as plt
import pandas as pd
# Example time series data
time = ['2020-01-01', '2020-01-02', '2020-01-03', '2020-01-04']
data = [100, 120, 130, 140]
# Create a pandas dataframe from the time series data
df = pd.DataFrame({'Time': time, 'Data': data})
# Plot the time series data as a bar plot
plt.bar(df['Time'], df['Data'], width=0.8)
plt.xticks(rotation=90)
plt.show()
The above code will create a bar plot with the time values along the x-axis and the data values along the y-axis. You can customize the appearance of the plot by adjusting various parameters, such as the width of the bars, the color of the bars, and the labels for the x and y axes.
In this article, we have gone through different implementations of bar plots in Python and understand the different types of plots we use for exploratory data analysis using bar graphs.When creating a bar plot in Python, it is important to choose an appropriate type based on the data’s nature and the information you want to communicate.
Hope you like the article! In this guide, we explore how to create a stunning Python bar plot using Matplotlib. With the plt.bar
function, you can effortlessly generate a Python matplotlib bar chart to visualize your data effectively.
Key Takeaways
Take your data visualization skills to the next level and gain a comprehensive understanding of data science techniques, consider enrolling in our Data Science Black Belt program. Our program offers hands-on experience and personalized mentorship to equip you with the skills and knowledge needed to succeed in the fast-paced world of data science. Enroll today and take the first step towards becoming a data science expert!
A. We can graph a bar graph in python using the Matplotlib library’s “bar()” function.
A. Some of the most common types of bar plots in Python are:
1. Simple bar plot: A bar plot representing a single data set, where each bar represents a single data point.
2. Grouped bar plot: A bar plot representing multiple sets of data, where each group represents a separate data set.
3. Stacked bar plot: A bar plot representing multiple sets of data, where the height of each bar is the sum of the values for each data set.
4. Horizontal bar plot: A bar plot that is rotated 90 degrees to the left, where the x-axis is the vertical axis, and the y-axis is the horizontal axis.
Error bar plot: A bar plot that includes error bars representing the data’s uncertainty.
A. Commonly used Python objects for plotting a bar graph are lists, Numpy arrays, and Pandas dataframes.
A. matplotlib.pyplot.bar is a Python function that creates a vertical bar chart to display data. This function is part of the matplotlib library and is typically used to visualize categorical data. The bar function takes in parameters such as the x-coordinates of the bars, the height of the bars, and the width of the bars, and can be customized with various optional parameters such as colors, labels, and titles. Overall, matplotlib.pyplot.bar is a useful tool for creating clear and informative visualizations of data.
Lorem ipsum dolor sit amet, consectetur adipiscing elit,
This is a great tutorial on how to create bar plots in Python. I found it helpful and easy to follow.
This is a great tutorial on how to create bar plots in Python. I found it helpful and easy to follow.