Exploring The Different Types Of Probability Distribution Function!
This article was published as a part of the Data Science Blogathon
Introduction:
In this article, we will be learning different types of Probability distribution functions.
Table of Contents:
- What is a distribution function?
- Data types
- PDF, PMF, and CDF
- Types of distribution functions
What is a distribution function?
In statistical terms, a distribution function is a mathematical expression that describes the probability of different possible outcomes for an experiment.
It is denoted as Variable ~ Type (Characteristics)
Let us say we are running an experiment of tossing a fair coin. The possible events are Heads, Tails. And for instance, if we use X to denote the events, the probability distribution of X would take the value 0.5 for X=heads, and 0.5 for X=tails.
Data Types
At a higher level, we have Qualitative and Quantitative data. And in Quantitative data, we have Continuous and Discrete data types.
Continuous data is measured and can take any number of values in a given finite or infinite range. It can be represented in decimal format. And the random variable that holds continuous values is called the Continuous random variable.
Examples: A person’s height, Time, distance, etc.
Discrete data is counted and can take only a limited number of values. It makes no sense when written in decimal format. And the random variable that holds discrete data is called the Discrete random variable.
Example: The number of students in a class, number of workers in a company, etc.
Types of distribution functions:
Based on the types of data we deal with, we have two types of distribution functions.
For discrete data, we have discrete distributions; and for continuous data, we have continuous distributions.
Discrete distributions |
Continuous distributions |
Uniform distribution | Normal distribution |
Binomial distribution | Standard Normal distribution |
Bernoulli distribution | Student’s T distribution |
Poisson distribution | Chi-squared distribution |
Before deep-diving into the types of distributions, it is important to revise the fundamental concepts like Probability Density Function (PDF), Probability Mass Function (PMF), and Cumulative Density Function (CDF).
Probability Density Function (PDF):
It is a statistical term that describes the probability distribution of a continuous random variable. The probability associate with a single value is always Zero. Below is the formula for PDF.
The formula for PDF. Source
Probability Mass Function (PMF)
It is a statistical term that describes the probability distribution of a discrete random variable.
Cumulative Distribution Function (CDF)
It is another method to describe the distribution of a random variable (either continuous or discrete).
Discrete distributions
Let us start with the easiest one – Uniform distribution.
Discrete Uniform distribution (U)
It is denoted as X ~ U (a, b). And is read as X is a discrete random variable that follows uniform distribution ranging from a to b.
Uniform distribution is when all the possible events are equally likely. For example, consider an experiment of rolling a dice. We have six possible events X = {1, 2, 3, 4, 5, 6} each having a probability of P(X) = 1/6.
The PMF graph of the above experiment is:
The formula for PMF, CDF of Uniform distribution function are:
The Mean and Variance of Uniform distribution are:
Mean = (a+b)/2
Variance = (n^{2}-1)/12
Binomial distribution (B):
It is denoted as X ~ B(n, p). And is read as X is a discrete random variable that follows Binomial distribution with parameters n, p.
Where n is the no. of trials, and p is the success probability for each trial.
Binomial distribution is a discrete probability distribution of the number of successes in ‘n’ independent experiments sequence. The two outcomes of a Binomial trial could be Success/Failure, Pass/Fail/, Win/Lose, etc.
Generally, the outcome success is denoted as 1, and the probability associated with it is p.
And Failure is denoted as 0, and the probability associate with it is q = 1-p.
The formula for PMF, CDF of Binomial distribution are:
K in the above formula is the number of successes.
The mean and variance of a binomial distribution are given as:
Mean = np
Variance = npq
Now consider, we ran a Binomial experiment 10 times, and the probability of success = 0.25. Below are how PMF, CDF looks.
Image by Author
PMF, CDF for a Binomial experiment with Probability of success = Probability of failure look like below.
Bernoulli distribution (Bern):
It is denoted as X ~ Bern(p). And is read as X is a discrete random variable that follows Bernoulli distribution with parameter p.
Where p is the probability of the success.
Bernoulli can be represented as a Binomial experiment with a single trial.
X ~ Bern(p) —-> X ~ B(1, p)
The formula for PMF, CDF of Bernoulli distribution is:
The Mean and Variance of Bernoulli distribution are given as:
Mean = p
Variance = p(1-p) = pq
Example: Consider an example of tossing a fair. The two possible outcomes are Heads, Tails. The probability (p) associated with each of them is 1/2.
If we take an unfair coin, the probability associated with each of them need not be 1/2. Heads can have a probability of p = 0.8, then the probability of tail q = 1-p = 1-0.8 = 0.2
Bernoulli’s event suggests which outcome can be expected for a single trial. Whereas, a Binomial event suggests the no. of times a specific outcome can be expected.
Poisson Distribution (P_{o}):
It is denoted as X ~ P_{o}(λ). And is read as X is a discrete random variable that follows Poisson Distribution with parameter λ.
Where λ is the expected rate of occurrences.
Poisson Distribution is a discrete probability distribution function that expresses the probability of a given number of events occurring in a fixed time interval.
Examples:
- The number of diners at a restaurant on a given day.
- Calls per hour at a call centre.
The formula for PMF, CDF of poison distribution are:
The Mean and Variance of Poisson distribution are given as:
Mean = Variance = λ
A Poisson distribution with λ = 5 look like below
Continuous Distributions
Normal or Gaussian Distribution (N)
It is denoted as X ~ N (μ, σ^{2}). And is read as X is a continuous random variable that follows a Normal distribution with parameters μ, σ^{2}.
Where μ is the mean, and σ^{2 }is the variance. Mean, Variance together talks about shape statistics.
A normal distribution is a continuous distribution that describes the probability of a continuous random variable that takes real values.
Examples: Heights of people, exam scores of students, IQ Scores, etc follows Normal distribution.
Properties of Normal distribution:
- The random variable takes values from -∞ to +∞
- The probability associate with any single value is Zero.
- looks like a bell curve and is symmetric about x=μ. 50% of data lies on the left-hand side and 50% of the data lies on the right-hand side.
- The area under the curve (AUC) = 1
- All the measures of central tendency coincide i.e., mean = median = mode
A normal distribution with different means, standard deviations look like below:
Normal distribution follows the 68-95-99.7 rule. This rule is also known as the empirical rule. According to it, 68% of data lies in the first standard deviation range, 95% of data lies in the second standard deviation range, and 99.7% of data lies in the third standard deviation range.
The formula for PDF, CDF of the normal distribution are:
The Mean and Variance of a Normal distribution are given as:
Mean = μ
Variance = σ^{2}
Let’s assume we have a height distribution with mean = 25, standard deviation = 2.48. Below is how the graph looks like.
Standard Normal Distribution or SND:
It is denoted as Z ~ N(0, 1). And is read as X is a continuous random variable that follows Normal distribution with mean 0 and variance 1.
It is a transformation of Normal distribution in such a way that Mean = 0, and standard deviation 1.
Transformation is a way in which we alter every element of distribution to get a new distribution with similar characteristics.
All the properties of a Normal distribution will be satisfied by a Standard Normal distribution.
And in addition, there exists a table that summarizes the most commonly used values of a CDF of s Standard Normal Distribution. This table is known as a Z-score table.
The formula for standardisation is Z = (X-μ)/σ
The formula for PDF, CDF of Standard Normal distribution are given as:
The Mean and Variance of Standard Normal distribution are:
Mean = 0
Variance = 1
Normal distribution with mean 0 and variance 1 (SND) looks like below:
Student’s T distribution or t-distribution (t)
It is denoted as X ~ t(k). And is read as X is a continuous random variable that follows Student’s T distribution with parameter k.
where k is the degrees of freedom. If the sample size is n, then k = n-1.
Student’s T distribution is a small sample size approximation of a normal distribution. As the degrees of freedom increase, t distribution tends to become Standard Normal distribution.
The formula for PDF, CDF of t-distribution are:
(v in above formulae is degrees of freedom)
t-distribution can be used in Hypothesis testing (to test if there is any significant difference between two sample means), calculating confidence intervals with population standard deviation is unknown.
Like standard normal distribution, t-distribution also has a table of its own. This table is known as the t-table.
The mean and variance of Student’s T distribution are:
Mean = 0
variance = k/(k-2)
A student’s t-distribution with degrees of freedom = 25 looks like below:
Chi-Square distribution
It is denoted as X~χ^{2}(k). And is read as X is a continuous random variable that follows Chi-Square distribution with k degrees of freedom.
It is used in Hypothesis testing, computing confidence intervals, and for the goodness of fit.
It is a transformation of t-distribution. Finding the t-distribution to the power of 2 gives Chi-Square distribution and finding the square root of Chi-Square of distribution gives us t-distribution.
Chi-Square distribution has a chi-square table.
The formula for PDF, CDF of Chi-square distribution are:
The mean and variance of Chi-square distribution are:
Mean = k
Variance = 2k
A Chi-square distribution with degrees of freedom = 5 looks like below.
End Notes:
Thank you for reading till the conclusion. By the end of this article, we are familiar with different Probability distributions that are frequently used in Statistics.
I hope this article is informative. Feel free to share it with your study buddies.
References:
Other Blog Posts by me
Feel free to check out my other blog posts from my Analytics Vidhya Profile.
You can find me on LinkedIn, Twitter in case you would want to connect. I would be glad to connect with you.
For immediate exchange of thoughts, please write to me at [email protected].
The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.