Harika Bonthu — July 30, 2021
Beginner Probability Structured Data

This article was published as a part of the Data Science Blogathon

Probability Distribution Function Image
Source

Introduction:

In this article, we will be learning different types of Probability distribution functions.

Table of Contents:

  • What is a distribution function?
  • Data types
  • PDF, PMF, and CDF
  • Types of distribution functions

What is a distribution function?

In statistical terms, a distribution function is a mathematical expression that describes the probability of different possible outcomes for an experiment.

It is denoted as Variable ~ Type (Characteristics)

Let us say we are running an experiment of tossing a fair coin. The possible events are Heads, Tails. And for instance, if we use X to denote the events, the probability distribution of X would take the value 0.5 for X=heads, and 0.5 for X=tails.

Data Types

At a higher level, we have Qualitative and Quantitative data. And in Quantitative data, we have Continuous and Discrete data types.

Continuous data is measured and can take any number of values in a given finite or infinite range. It can be represented in decimal format. And the random variable that holds continuous values is called the Continuous random variable.

Examples: A person’s height, Time, distance, etc.

Discrete data is counted and can take only a limited number of values. It makes no sense when written in decimal format. And the random variable that holds discrete data is called the Discrete random variable.

Example: The number of students in a class, number of workers in a company, etc.

Types of distribution functions:

Based on the types of data we deal with, we have two types of distribution functions.

For discrete data, we have discrete distributions; and for continuous data, we have continuous distributions.

Discrete distributions 

Continuous distributions

Uniform distribution Normal distribution
Binomial distribution Standard Normal distribution
Bernoulli distribution Student’s T distribution
Poisson distribution Chi-squared  distribution

Before deep-diving into the types of distributions, it is important to revise the fundamental concepts like Probability Density Function (PDF), Probability Mass Function (PMF), and Cumulative Density Function (CDF).

Probability Density Function (PDF): 

It is a statistical term that describes the probability distribution of a continuous random variable. The probability associate with a single value is always Zero. Below is the formula for PDF.

probability distribution function

The formula for PDF. Source

Probability Mass Function (PMF)

It is a statistical term that describes the probability distribution of a discrete random variable.

probability mass function
The formula for PMF. Source 

Cumulative Distribution Function (CDF)

It is another method to describe the distribution of a random variable (either continuous or discrete).

cumulative distribution function
The formula for CDF. Source

Discrete distributions

Let us start with the easiest one – Uniform distribution.

Discrete Uniform distribution (U)

It is denoted as X ~ U (a, b). And is read as X is a discrete random variable that follows uniform distribution ranging from a to b.

Uniform distribution is when all the possible events are equally likely. For example, consider an experiment of rolling a dice. We have six possible events X = {1, 2, 3, 4, 5, 6} each having a probability of P(X) = 1/6.

The PMF graph of the above experiment is:

The formula for PMF, CDF of Uniform distribution function are:

fromula of pmf, cmf

The Mean and Variance of Uniform distribution are:

Mean = (a+b)/2

Variance = (n2-1)/12

Binomial distribution (B):

It is denoted as X ~ B(n, p). And is read as X is a discrete random variable that follows Binomial distribution with parameters n, p.
Where n is the no. of trials, and p is the success probability for each trial.

Binomial distribution is a discrete probability distribution of the number of successes in ‘n’ independent experiments sequence. The two outcomes of a Binomial trial could be Success/Failure, Pass/Fail/, Win/Lose, etc.

Generally, the outcome success is denoted as 1, and the probability associated with it is p.

And Failure is denoted as 0, and the probability associate with it is q = 1-p.

The formula for PMF, CDF of Binomial distribution are:

binomial distribution pMF and CMF

Source

K in the above formula is the number of successes.

The mean and variance of a binomial distribution are given as:

Mean = np

Variance = npq

Now consider, we ran a Binomial experiment 10 times, and the probability of success = 0.25. Below are how PMF, CDF looks.

 

Binomial distribution | Probability Distribution Function

Image by Author

PMF, CDF for a Binomial experiment with Probability of success = Probability of failure look like below.

Probability Distribution Function |Binomial

Bernoulli distribution (Bern): 

It is denoted as X ~ Bern(p). And is read as X is a discrete random variable that follows Bernoulli distribution with parameter p.
Where p is the probability of the success.

Bernoulli can be represented as a Binomial experiment with a single trial.

X ~ Bern(p) —-> X ~ B(1, p)

The formula for PMF, CDF of Bernoulli distribution is:

The Mean and Variance of Bernoulli distribution are given as:

Mean = p

Variance = p(1-p) = pq

 

Example: Consider an example of tossing a fair. The two possible outcomes are Heads, Tails. The probability (p) associated with each of them is 1/2.

If we take an unfair coin, the probability associated with each of them need not be 1/2. Heads can have a probability of p = 0.8, then the probability of tail q = 1-p = 1-0.8 = 0.2

example | Probability Distribution Function

Bernoulli’s event suggests which outcome can be expected for a single trial. Whereas, a Binomial event suggests the no. of times a specific outcome can be expected. 

Poisson Distribution (Po):

It is denoted as X ~ Po(λ). And is read as X is a discrete random variable that follows Poisson Distribution with parameter λ.
Where λ is the expected rate of occurrences.

Poisson Distribution is a discrete probability distribution function that expresses the probability of a given number of events occurring in a fixed time interval.

Examples: 

  • The number of diners at a restaurant on a given day.
  • Calls per hour at a call centre.

The formula for PMF, CDF of poison distribution are:

The Mean and Variance of Poisson distribution are given as:

Mean = Variance = λ

A Poisson distribution with λ = 5 look like below

example poison distribution

Continuous Distributions

Normal or Gaussian Distribution (N)

It is denoted as X ~ N (μ, σ2). And is read as X is a continuous random variable that follows a Normal distribution with parameters μ, σ2.
Where μ is the mean, and σis the variance. Mean, Variance together talks about shape statistics.

A normal distribution is a continuous distribution that describes the probability of a continuous random variable that takes real values.

Examples: Heights of people, exam scores of students, IQ Scores, etc follows Normal distribution.

Properties of Normal distribution:

  • The random variable takes values from -∞ to +∞
  • The probability associate with any single value is Zero.
  • looks like a bell curve and is symmetric about x=μ. 50% of data lies on the left-hand side and 50% of the data lies on the right-hand side.
  • The area under the curve (AUC) = 1
  • All the measures of central tendency coincide i.e., mean = median = mode

A normal distribution with different means, standard deviations look like below:

normal distribution | Probability Distribution Function

Source

Normal distribution follows the 68-95-99.7 rule. This rule is also known as the empirical rule. According to it, 68% of data lies in the first standard deviation range, 95% of data lies in the second standard deviation range, and 99.7% of data lies in the third standard deviation range.

The formula for PDF, CDF of the normal distribution are:

The Mean and Variance of a Normal distribution are given as:

Mean = μ

Variance = σ2

Let’s assume we have a height distribution with mean = 25, standard deviation = 2.48. Below is how the graph looks like.

example normal disribution

Standard Normal Distribution or SND:

It is denoted as Z ~ N(0, 1). And is read as X is a continuous random variable that follows  Normal distribution with mean 0 and variance 1.

It is a transformation of Normal distribution in such a way that Mean = 0, and standard deviation 1.

Transformation is a way in which we alter every element of distribution to get a new distribution with similar characteristics.

All the properties of a Normal distribution will be satisfied by a Standard Normal distribution.

SND | Probability Distribution Function

Source

And in addition, there exists a table that summarizes the most commonly used values of a CDF of s Standard Normal Distribution. This table is known as a Z-score table.

The formula for standardisation is Z = (X-μ)/σ

The formula for PDF, CDF of Standard Normal distribution are given as:

 

The Mean and Variance of Standard Normal distribution are:

Mean = 0

Variance = 1

Normal distribution with mean 0 and variance 1 (SND) looks like below:

normal distribution with mean 0 and sd 1

Student’s T distribution or t-distribution (t)

It is denoted as X ~ t(k). And is read as X is a continuous random variable that follows Student’s T distribution with parameter k.
where k is the degrees of freedom. If the sample size is n, then k = n-1.

Student’s T distribution is a small sample size approximation of a normal distribution. As the degrees of freedom increase, t distribution tends to become Standard Normal distribution.

T distribution

Source

The formula for PDF, CDF of t-distribution are:

(v in above formulae is degrees of freedom)

t-distribution can be used in Hypothesis testing (to test if there is any significant difference between two sample means), calculating confidence intervals with population standard deviation is unknown.

Like standard normal distribution, t-distribution also has a table of its own. This table is known as the t-table.

The mean and variance of Student’s T distribution are:

Mean = 0

variance = k/(k-2)

A student’s t-distribution with degrees of freedom = 25 looks like below:

example tdistribution | Probability Distribution Function

Chi-Square distribution

It is denoted as X~χ2(k). And is read as X is a continuous random variable that follows Chi-Square distribution with k degrees of freedom.

It is used in Hypothesis testing, computing confidence intervals, and for the goodness of fit.

It is a transformation of t-distribution. Finding the t-distribution to the power of 2 gives Chi-Square distribution and finding the square root of Chi-Square of distribution gives us t-distribution.

Chi-Square distribution has a chi-square table.

The formula for PDF, CDF of Chi-square distribution are:

chi square distribution

Source

The mean and variance of Chi-square distribution are:

Mean = k

Variance = 2k

A Chi-square distribution with degrees of freedom = 5 looks like below.

example chi sqaured

End Notes:

Thank you for reading till the conclusion. By the end of this article, we are familiar with different Probability distributions that are frequently used in Statistics.

I hope this article is informative. Feel free to share it with your study buddies.

References:

Plot distributions online

Other Blog Posts by me

Feel free to check out my other blog posts from my Analytics Vidhya Profile.

You can find me on LinkedIn, Twitter in case you would want to connect. I would be glad to connect with you.

For immediate exchange of thoughts, please write to me at [email protected].

The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.

About the Author

Our Top Authors

  • Analytics Vidhya
  • Guest Blog
  • Tavish Srivastava
  • Aishwarya Singh
  • Aniruddha Bhandari
  • Abhishek Sharma
  • Aarshay Jain

Download Analytics Vidhya App for the Latest blog/Article

Leave a Reply Your email address will not be published. Required fields are marked *