avcontentteam — Published On September 18, 2017 and Last Modified On May 12th, 2023
Business Analytics Intermediate Listicle Maths Probability


Welcome to the world of Probability in Data Science! Let me start things off with an intuitive example. Imagine you are a Data Analyst or someone making Machine Learning models or working on algorithms or python scripts, and you need to analyze trends. Still, you don’t have enough data set with you to analyze the trend in your dataset. Through this article, let’s find a way to solve this problem using probability distribution.

Learning Objectives

  • In this tutorial, we will learn about common data types.
  • You will also learn about different types of distributions and the Probability Density function.
  • At last, you will learn about the relations between these distributions.

What is Probability Distribution?

A probability distribution is a mathematical function that defines the likelihood of different outcomes or values of a variable. This function is commonly represented by a graph or probability table, and it provides the probabilities of various possible results of an experiment or random phenomenon based on the sample space and the probabilities of events. Probability distributions are fundamental in probability theory and statistics for analyzing data and making predictions.

Let’s start with an example. Suppose you are a teacher at a university. After checking assignments for a week, you graded all the students. You gave these graded papers to a data entry guy in the university and told him to create a spreadsheet containing the grades of all the students. But the guy only stores the grades and not the corresponding students.

Student details

He made another blunder; he missed a few entries in a hurry, and we have no idea whose grades are missing. One way to find this out is by visualizing the grades and seeing if you can find a trend in the data.

Frequency distribution graph

The graph you plotted is called the frequency distribution of the data. You see that there is a smooth curve-like structure that defines our data, but do you notice an anomaly? We have an abnormally low frequency at a particular score range. So the best guess would be to have missing values that remove the dent in the distribution.

This is how you try to solve a real-life problem using data analysis. Distribution is a must-know concept for any Data Scientist, student, or practitioner. It provides the basis for analytics and inferential statistics.

While the concept of probability or equal probability gives us mathematical calculations, distributions help us actually visualize what’s happening underneath.

In this article, I have covered some important types of probability distributions, which are explained in a lucid and comprehensive manner.

Note: This article assumes you have a basic knowledge of probability. If not, you can refer to this probability distribution or the following fundamentals of probability.

Probability is the systematic consideration of the outcomes of a random experiment. For example, when we do the coin toss, there are two possible outcomes – heads or tails. Each of these options has the same probability of the number of successes occurring during each flip. The probability of either heads or tails on a single coin flip is ½, which is symmetric distribution in probability.

Common Data Types

Before we jump on to the explanation of distributions, let’s see what kind of data we can encounter. The data can be discrete or continuous.

Discrete Data, as the name suggests, can take only specified values. For example, when you roll a die, the possible outcomes are 1, 2, 3, 4, 5, or 6, not 1.5 or 2.45. (Discrete Probability Distribution)

Continuous Data can take any value within a given range. The range may be finite or infinite. For example, a girl’s weight or height, the length of the road. The weight of a girl can be any value – 54 kgs, 54.5 kgs, or 54.5436kgs. (Continuous Probability Distribution)

Now let us start with the types of distributions.

Types of Distributions

Here is a list of distributions types

  • Bernoulli Distribution
  • Uniform Distribution
  • Binomial Distribution
  • Normal or Gaussian Distribution
  • Exponential Distribution
  • Poisson Distribution

Bernoulli Distribution

Let’s start with the easiest distribution, which is Bernoulli Distribution. It is actually easier to understand than it sounds!

All you cricket junkies out there! At the beginning of any cricket match, how do you decide who will bat or ball? A toss! It all depends on whether you win or lose the toss, right? Let’s say if the toss results in a head, you win. Else, you lose. There’s no midway.

A Bernoulli distribution has only two bernoulli trials or possible outcomes, namely 1 (success) and 0 (failure), and a single trial. So the random variable X with a Bernoulli distribution can take the value 1 with the probability of success, say p, and the value 0 with the probability of failure, say q or 1-p.

Here, the occurrence of a head denotes success, and the occurrence of a tail denotes failure.
Probability of getting a head = 0.5 = Probability of getting a tail since there are only two possible outcomes.

The probability mass function is given by: px(1-p)1-x  where x € (0, 1).
It can also be written as

Probability mass function | probability distribution

The probabilities of success and failure need not be equally likely, like the result of a fight between Undertaker and me. He is pretty much certain to win. So, in this case probability of my success is 0.15, while my failure is 0.85

Here, the probability of success(p) is not the same as the probability of failure. So, the chart below shows the Bernoulli Distribution of our fight.

Bernoulli Distribution bar graph

Here, the probability of success = 0.15, and the probability of failure = 0.85. The expected value is exactly what it sounds like. If I punch you, I may expect you to punch me back. Basically expected value of any distribution is the mean of the distribution. The expected value of a random variable X from a Bernoulli distribution is found as follows:

E(X) = 1*p + 0*(1-p) = p

The variance of a random variable from a bernoulli distribution is:

V(X) = E(X²) – [E(X)]² = p – p² = p(1-p)

There are many examples of Bernoulli distribution, such as whether it will rain tomorrow or not, where rain denotes success and no rain denotes failure and Winning (success) or losing (failure) the game.

Uniform Distribution

When you roll a fair die, the outcomes are 1 to 6. The probabilities of getting these outcomes are equally likely, which is the basis of a uniform distribution. Unlike Bernoulli Distribution, all the n number of possible outcomes of a uniform distribution are equally likely.

A variable X is said to be uniformly distributed if the density function is:

Density function

The graph of a uniform distribution curve looks like

Uniform distribution curve | probability distribution

You can see that the shape of the Uniform distribution curve is rectangular, the reason why Uniform distribution is called rectangular distribution.

For a Uniform Distribution, a and b are the parameters. 

The number of bouquets sold daily at a flower shop is uniformly distributed, with a maximum of 40 and a minimum of 10.

Let’s try calculating the probability that the daily sales will fall between 15 and 30.

The probability that daily sales will fall between 15 and 30 is (30-15)*(1/(40-10)) = 0.5

Similarly, the probability that daily sales are greater than 20 is  = 0.667

The mean and variance of X following a uniform distribution are:

Mean -> E(X) = (a+b)/2

Variance -> V(X) =  (b-a)²/12

The standard uniform density has parameters a = 0 and b = 1, so the PDF for standard uniform density is given by:

Uniform density

Binomial Distribution

Let’s get back to cricket.  Suppose you won the toss today, indicating a successful event. You toss again, but you lose this time. If you win a toss today, this does not necessitate that you will win the toss tomorrow. Let’s assign a random variable, say X, to the number of times you won the toss. What can be the possible value of X? It can be any number depending on the number of times you tossed a coin.

There are only two possible outcomes. Head denoting success and tail denoting failure. Therefore, the probability of getting a head = 0.5 and the probability of failure can be easily computed as: q = 1- p = 0.5.

A distribution where only two outcomes are possible, such as success or failure, gain or loss, win or lose and where the probability of success and failure is the same for all the trials is called a Binomial Distribution.

The outcomes need not be equally likely. Remember the example of a fight between Undertaker and me? So, if the probability of success in an experiment is 0.2, then the probability of failure can be easily computed as q = 1 – 0.2 = 0.8.

Each trial is independent since the outcome of the previous toss doesn’t determine or affect the outcome of the current toss. An experiment with only two possible outcomes repeated n number of times is called binomial. The parameters of a binomial distribution are n and p, where n is the total number of trials and p is the probability of success in each trial.

Based on the above explanation, the properties of a Binomial Distribution are:

  1. Each trial is independent.
  2. There are only two possible outcomes in a trial – success or failure.
  3. A total number of n identical trials are conducted.
  4. The probability of success and failure is the same for all trials. (Trials are identical.)

The mathematical representation of binomial distribution is given by:

Binomial distribution function

A binomial distribution graph where the probability of success does not equal the probability of failure looks like this.

Binomial distribution bar graph

Now, when the probability of success = probability of failure, in such a situation, the graph of binomial distribution looks like

Binomial distribution graph

The mean and variance of a binomial distribution are given by:

Mean -> µ = n*p

Variance -> Var(X) = n*p*q

Normal Distribution or Gaussian Distribution

The normal distribution represents the behavior of most of the situations in the universe (That is why it’s called a “normal” distribution. I guess!). The large sum of (small) random variables often turns out to be normally distributed, contributing to its widespread application. Any distribution is known as Normal distribution if it has the following characteristics:

  1. The mean, median, and mode of the distribution coincide.
  2. The curve of the distribution is bell-shaped and symmetrical about the line x=μ.
  3. The total area under the curve is 1.
  4. Exactly half of the values are to the left of the center, and the other half to the right.

A normal distribution is highly different from Binomial Distribution. However, if the number of trials approaches infinity, then the shapes will be quite similar.

The PDF of a random variable X, following a normal distribution, is given by:

Normal distribution function

The mean and variance of a random variable X, which is said to be normally distributed, is given by:

Mean -> E(X) = µ

Variance -> Var(X) = σ^2

Here, µ (mean) and σ (standard deviation) are the parameters.
The graph of a random variable X ~ N (µ, σ) is shown below.

Mean and standard deviation curves

A standard normal distribution is defined as a distribution with a mean of 0 and a standard deviation of 1.  For such a case, the PDF becomes:

Standard normal distribution | probability distribution
Standard normal distribution curve | probability distribution

Poisson Distribution

Suppose you work at a call center; approximately how many calls do you get in a day? It can be any number. Now, the entire number of calls at a call center in a day is modeled by Poisson distribution. Some more examples are:

  1. The number of emergency calls recorded at a hospital in a day.
  2. The number of thefts reported in an area in a day.
  3. The number of customers arriving at a salon in an hour.
  4. The number of suicides reported in a particular city.
  5. The number of printing errors on each page of the book.

You can now think of many examples following the same course. Poisson Distribution is applicable in situations where events occur at random points of time and space wherein our interest lies only in the number of occurrences of the event.

A distribution is called a Poisson distribution when the following assumptions are valid:

1. Any successful event should not influence the outcome of another successful event.
2. The probability of success over a short interval must equal its probability over a longer interval.
3. The probability of success in an interval approaches zero as the interval becomes smaller.

Now, if any distribution validates the above assumptions, then it is a Poisson distribution. Some notations used in Poisson distribution are:

  • λ is the rate at which an event occurs,
  • t is the length of a time interval,
  • And X is the number of events in that time interval.

Here, X is called a Poisson Random Variable, and the probability distribution of X is called Poisson distribution.

Let µ denote the mean number of events in an interval of length t. Then, µ = λ*t.

The PMF of X following a Poisson distribution is given by:

Poisson distribution formula | probability distribution

The mean µ is the parameter of this distribution. µ is also defined as the λ times the length of that interval. The graph of a Poisson distribution is shown below:

Poisson distribution graph | probability distribution

The graph shown below illustrates the shift in the curve due to the increase in the mean.

Poisson distribution graph

It is perceptible that as the mean increases, the curve shifts to the right.

The mean and variance of X following a Poisson distribution:

Mean -> E(X) = µ
Variance -> Var(X) = µ

Exponential Distribution

Let’s consider the call center example one more time. What about the interval of time between the calls? Here, the exponential distribution comes to our rescue. Exponential distribution models the interval of time between the calls.

Other examples are:

1. Length of time between metro arrivals
2. Length of time between arrivals at a gas station
3. The life of an air conditioner

The exponential distribution is widely used for survival analysis. From the expected life of a machine to the expected life of a human, exponential distribution successfully delivers the result.

A random variable X is said to have an exponential distribution with PDF:

f(x) = { λe-λx,  x ≥ 0

And parameter λ>0, which is also called the rate.

For survival analysis, λ is called the failure rate of a device at any time t, given that it has survived up to t.

Mean and Variance of a random variable X following an exponential distribution:

Mean -> E(X) = 1/λ

Variance -> Var(X) = (1/λ)²

Also, the greater the rate, the faster the curve drops, and the lower the rate, the flatter the curve. This is explained better with the graph shown below.

To ease the computation, there are some formulas given below.
P{X≤x} = 1 – e-λx corresponds to the area under the density curve to the left of x.

P{X>x} = e-λx corresponds to the area under the density curve to the right of x.

P{x1<X≤ x2} = e-λx1 – e-λx2, corresponds to the area under the density curve between x1 and x2.

Distribution Function in Probability

In Probability, the probability density function of a continuous random variable is a function whose value at any given sample (or point) in the dataset or sample space can be interpreted as providing a relative likelihood that the value of the random variable would be equal to that sample. PDF is the probability per unit length. In other words, while the absolute likelihood for a continuous random variable to take on any particular value is 0, the value of the PDF at two different samples can be used to infer how much more likely it is that the random variable would be close to one sample compared to the other sample.

Relations Between the Distributions

Relation Between Bernoulli and Binomial Distribution

  • Bernoulli Distribution is a special case of Binomial Distribution with a single trial.
  • There are only two possible outcomes of a Bernoulli and Binomial distribution, namely success and failure.
  • Both Bernoulli and Binomial Distributions have independent trials.

Relation Between Poisson and Binomial Distribution

Poisson Distribution is a limiting case of binomial distribution under the following conditions:

  • The number of trials is indefinitely large or n → ∞.
  • The probability of success for each trial is the same and indefinitely small or p →0.
  • np = λ, is finite.

Relation Between Normal and Binomial Distribution & Normal and Poisson Distribution

A normal distribution is another limiting form of binomial distribution under the following conditions:

  • The number of trials is indefinitely large, n → ∞.
  • Both p and q are not indefinitely small.

The normal distribution is also a limiting case of Poisson distribution with the parameter λ →∞.

Relation Between Exponential and Poisson Distribution

If the times between random events follow an exponential distribution with rate λ, then the total number of events in a time period of length t follows the Poisson distribution with parameter λt.

Test Your Knowledge

You have come this far. Now, are you able to answer the following questions? Let me know in the comments below!

1. The formula to calculate standard normal random variable is:

a. (x+µ) / σ
b. (x-µ) / σ
c. (x-σ) / µ

2. In Bernoulli Distribution, the formula for calculating standard deviation is given by:

a. p (1 – p)
b. SQRT(p(p – 1))
c. SQRT(p(1 – p))

3. For a normal distribution, an increase in the mean will:

a. shift the curve to the left
b. shift the curve to the right
c. flatten the curve

4. The lifetime of a battery is exponentially distributed with λ = 0.05 per hour. The probability for a battery to last between 10 and 15 hours is:



Probability Distributions are prevalent in many sectors, including insurance, physics, engineering, computer science, and even social science, wherein students of psychology and medicine are widely using probability distributions. It has an easy application and widespread use. This article highlighted and explained the application of six important distributions observed in daily life. Now you will be able to identify, relate and differentiate among these distributions.

For a more in-depth write up of these distributions, you can refer this resource.

Key Takeaways

  • Probability is commonly used by data scientists to model situations where experiments, independent events conducted during similar circumstances, yield different results, such as throwing dice or a coin.
  • Discrete random variables and continuous random variables are two types of quantitative variables. Discrete variables represent counts, for example, the number of objects in a collection, whereas continuous variables represent measurable amounts, for example, water volume or weight.
  • Normal distribution, chi-square distribution, binomial distribution, poisson distribution, and uniform distribution are some of the many different classifications of probability distributions.

Frequently Asked Questions

Q1. What distribution is the most commonly used in data science?

A. Gaussian distribution (normal distribution) is famous for its bell-like shape, and it’s one of the most commonly used distributions in data science or for Hypothesis Testing.

Q2. What are the 6 common probability distributions every data science professional should know?

A. The 6 common probability distributions are Bernoulli, Uniform, Binomial, Normal, Poisson, and Exponential Distribution.

Q3. What is the difference between a discrete and continuous distribution?

A. A discrete distribution is one in which the data can only take on certain values, and a continuous distribution is one in which data can take on any value within a specified range.

21 thoughts on "Understanding Probability Distributions | Definition & Types (Updated 2023)"

kapi says: September 18, 2017 at 9:15 pm
Hi Nicely written article, I would suggest some edits, In some cases, the explanation is unclear with the parameters not discussed at all, example in the Uniform distribution what are a and b, how was the formula derived etc. I have noticed with some other articles of analytics vidhya, where some key points are missing Just my honest opinion Reply
Aswani says: September 18, 2017 at 9:51 pm
Nicely written explanation! Very accessible. Reply
paul younes
paul younes says: September 18, 2017 at 10:48 pm
Your graphs are not showing up in any browser (IE, Chrome, or Firefox). Reply
Yoris says: September 19, 2017 at 12:49 am
I am very fond that I found this article. I currently am struggling to learn probability distribution and its application in College, but thanks to this article which I found it quite easy to understand and digest than college textbooks. Love it! ? Reply
Kunal Jain
Kunal Jain says: September 19, 2017 at 10:17 am
Thanks Kapi for the feedback. Do let me know, other places where you saw points missing. You can always drop me a mail on kunal at the rate analyticsvidhya dot com Reply
Bogdan Manea
Bogdan Manea says: September 19, 2017 at 2:42 pm
Nice concision, all are there. I just finish the Inferential statistic course and your idea to condense all this types here was so useful. Thanks. Reply
Manik says: September 20, 2017 at 7:52 pm
Thanks so much for clarifying the confusions that I had with distribution types for many years. Reply
Manik says: September 20, 2017 at 7:56 pm
Wonderful explanation of distribution types and its concept. Reply
Dalon says: October 02, 2017 at 9:02 pm
Thank you, that is a good article Reply
Khindau says: October 03, 2017 at 3:41 am
To this point : Both Bernoulli and Binomial Distributions have independent trails. Bernouli distribution is with single trial, so point about trials being independent doesn't make sense. Reply
Dheeraj says: October 03, 2017 at 7:38 am
Very well written . Simple easy to understand and pointed. A little more explanation on each formula would have been great. Saving it for further references though.. Thanks for this.. Reply
Viviane says: November 01, 2017 at 11:18 am
ERROR: The variance of a random variable from a binomial distribution is: V(X) = E(X²) – [E(X)]² = p – p² = p(1-p) I believe it should say: The variance of a random variable from a BERNOULLI distribution is: V(X) = E(X²) – [E(X)]² = p – p² = p(1-p) Reply
Khursheed Ahmad Ganaie
Khursheed Ahmad Ganaie says: November 04, 2017 at 12:36 am
Sir I hve no words that I should say aftr reading these distribution's in a simple way ........I need these articles daily so that India wll b forward in Statistical sciences ......#DREam of our Father of statistics Mahalanobis. ...it should be taught in every corner of India .. .. Thnku ..... Reply
Tony says: January 24, 2018 at 3:28 pm
Hi, first of all thank you for the article. Nice work, but I think that in section of Binomial Distribution in second paragraph "Therefore, probability of getting a head = 0.5 and the probability of failure can be easily computed as: q = p – 1 = 0.5." should be q = 1 - p ; since value of probability is always from . However, it was nice to repeat those concepts . Thanks Reply
Adnan Aziz
Adnan Aziz says: April 15, 2018 at 7:45 pm
thank you so much ..really helped me and easy to understand and differentiate Reply
Aishwarya Singh
Aishwarya Singh says: April 16, 2018 at 10:07 am
Hi Adnan, Glad you found this useful! Reply
Mukesh Azad
Mukesh Azad says: April 17, 2018 at 6:23 pm
Thank you so much for the valuable information.. Reply
Pradeep Brahma
Pradeep Brahma says: May 23, 2018 at 9:03 pm
How can we infer a Beta Distribution ? Reply
Aishwarya Singh
Aishwarya Singh says: May 24, 2018 at 7:06 pm
Hi Tony, Thank you for pointing it out. We have updates the same. Reply
Aishwarya Singh
Aishwarya Singh says: May 24, 2018 at 7:10 pm
Hi Viviane, Thank you for pointing it out. We have updates the same. Reply
Pulkit Sharma
Pulkit Sharma says: June 21, 2018 at 4:02 pm
Hi Pradeep, Beta distribution is a family of continuous probability distributions defined on the interval [0, 1] parametrized by two positive shape parameters, denoted by α and β. It can be used for determining the central tendency, i.e. mean, median or mode, measuring the statistical dispersion, skewness, kurtosis etc. These are some of the inferences that can be obtained from a Beta Distribution. Reply

Leave a Reply Your email address will not be published. Required fields are marked *