Discrete Probability Distributions

Priyanka Last Updated : 27 May, 2024

9 min read

Probability distributions are fundamental tools in statistics and data science, providing a way to model and understand the uncertainty in various phenomena. Discrete probability distributions, such as the binomial and Poisson distributions, deal with outcomes that can be counted and are often used to analyze discrete random variables. These distributions are essential for predicting and understanding outcomes ranging from the number of successes in a fixed number of trials to the number of customers visiting a store in a day. Let’s explore the fascinating world of discrete probability distributions and their applications in modeling and predicting random events.

Learning Outcomes

Understand the concept of expected value in the context of discrete probability distributions, where it represents the long-term average value of outcomes based on their probabilities.
Learn how the sum of probabilities of all possible outcomes in a discrete probability distribution always equals 1, ensuring that one of the outcomes must occur.
Recognize the fair coin as an example of a Bernoulli trial in probability theory, where the probability of heads and tails are both 0.5.
Understand the concept of a fair dice, which has 6 equally probable outcomes (1 through 6) in a uniform distribution.
Gain insight into uniform distribution, which describes a set of outcomes where each has an equal probability of occurring, such as rolling a fair dice.

This article was published as a part of the Data Science Blogathon.

What is Discrete Probability Distributions?
Understanding Discrete Probability Distributions with an Example
Types of Probability Distributions
- 1. Discrete Probability Distributions
- 2. Continuous Probability Distributions
Terminologies
- Cumulative Distribution Function
- Discrete Probability Distributions
Types of Discrete Probability Distributions
Discrete vs Continuous Distribution
Conclusion
Frequently Asked Questions

What is Discrete Probability Distributions?

Discrete probability distributions represent the likelihood of different outcomes in a discrete set, such as the results of rolling a dice or the number of successes in a fixed number of trials. Each outcome is associated with a probability, and when graphed, these probabilities create a distribution. Common examples include the binomial distribution for binary events and the Poisson distribution for rare events. Such distributions are essential in statistics and probability theory for modeling and analyzing discrete random variables.

Discrete probability distributions are graphs of the outcomes of test results, such as a value of 1, 2, 3, true, false, success, or failure. Investors use discrete probability distributions to estimate the chances that a particular investing outcome is more or less likely to happen.

Understanding Discrete Probability Distributions with an Example

Let X be a random variable that has more than one possible outcome. Plot the probability on the y-axis and the outcome on the x-axis. If we repeat the experiment many times and plot the probability of each possible outcome, we get a plot that represents the probabilities. This plot is called the probability distribution (PD). The height of the graph for X gives the probability of that outcome.

Types of Probability Distributions

There are two types of distributions based on the type of data generated by the experiments.

1. Discrete Probability Distributions

These distributions model the probabilities of random variables that can have discrete values as outcomes. For example, the possible values for the random variable X that represents the number of heads that can occur when a coin is tossed twice are the set {0, 1, 2} and not any value from 0 to 2 like 0.1 or 1.6.

Examples: Bernoulli, Binomial, Negative Binomial, Hypergeometric, etc.,

2. Continuous Probability Distributions

These distributions model the probabilities of random variables that can have any possible outcome. For example, the possible values for the random variable X that represents weights of citizens in a town which can have any value like 34.5, 47.7, etc.,

Examples: Normal, Student’s T, Chi-square, Exponential, etc.,

Also Read: Basics of Probability for Data Science explained with examples in R

Terminologies

Each PD provides us extra information on the behavior of the data involved. Each PD is given by a probability function that generalizes the probabilities of the outcomes.

Using this, we can estimate the probability of a particular outcome(discrete) or the chance that it lies within a particular range of values for any given outcome(continuous). The function is called a Probability Mass function (PMF) for discrete distributions and a Probability Density function (PDF) for continuous distributions. The total value of PMF and PDF over the entire domain is always equal to one.

Cumulative Distribution Function

The PDF gives the probability of a particular outcome whereas the Cumulative Distribution Function gives the probability of seeing an outcome less than or equal to a particular value of the random variable. CDFs are used to check how the probability has added up to a certain point. For example, if P(X = 5) is the probability that the number of heads on flipping a coin is 5 then, P(X <= 5) denotes the cumulative probability of obtaining 1 to 5 heads.

Cumulative distribution functions are also used to calculate p-values as a part of performing hypothesis testing.

Discrete Probability Distributions

There are many discrete probability distributions to be used in different scenarios. We will discuss Discrete distributions in this post. Binomial and Poisson distributions are the most discussed ones in the following list.

Bernoulli Distribution
Binomial Distribution
Hypergeometric Distribution
Negative Binomial Distribution
Geometric Distribution
Poisson Distribution
Multinomial Distribution

Types of Discrete Probability Distributions

Here are the list of types of Discrete probability distributions explained with examples.

Bernoulli Distribution

This distribution is generated when we perform an experiment once and it has only two possible outcomes – success and failure. The trials of this type are called Bernoulli trials, which form the basis for many distributions discussed below. Let p be the probability of success and 1 – p is the probability of failure.

The PMF is given as

Discrete Probability Distributions bernoulli distribution

One example of this would be flipping a coin once. p is the probability of getting ahead and 1 – p is the probability of getting a tail. Please note down that success and failure are subjective and are defined by us depending on the context.

Binomial Distribution

This is generated for random variables with only two possible outcomes. Let p denote the probability of an event is a success which implies 1 – p is the probability of the event being a failure. Performing the experiment repeatedly and plotting the probability each time gives us the Binomial distribution.

The most common example given for Binomial distribution is that of flipping a coin n number of times and calculating the probabilities of getting a particular number of heads. More real-world examples include the number of successful sales calls for a company or whether a drug works for a disease or not.

The PMF is given as,

binomial distribution Discrete Probability Distributions

where p is the probability of success, n is the number of trials and x is the number of times we obtain a success.

Hypergeometric Distribution

Consider an event of drawing a red marble from a box of marbles with different colors. The event of drawing a red ball is a success and not drawing it is a failure. But each time a marble is drawn it is not returned to the box and hence this affects the probability of drawing a ball in the next trial. The hypergeometric distribution models the probability of k successes over n trials where each trial is conducted without replacement. This is unlike the binomial distribution where the probability remains constant through the trials.

The PMF is given as,

Discrete Probability Distributions hypergeometric distribution

where k is the number of possible successes, x is the desired number of successes, N is the size of the population and n is the number of trials.

Negative Binomial Distribution

Sometimes we want to check how many Bernoulli trials we need to make in order to get a particular outcome. The desired outcome is specified in advance and we continue the experiment until it is achieved. Let us consider the example of rolling a dice. Our desired outcome, defined as a success, is rolling a 4. We want to know the probability of getting this outcome thrice. This is interpreted as the number of failures (other numbers apart from 4) that will occur before we see the third success.

The PMF is given as,

Discrete Probability Distributions - negative binomial distribution

p indicates success probability, k is failures observed, and r specifies desired successes.

Like in Binomial distribution, the probability through the trials remains constant and each trial is independent of the other.

Geometric Distribution

This is a special case of the negative binomial distribution where the desired number of successes is 1. It measures the number of failures we get before one success. Using the same example given in the previous section, we would like to know the number of failures we see before we get the first 4 on rolling the dice.

where p is the probability of success and k is the number of failures. Here, r = 1.

Poisson Distribution

This distribution describes the events that occur in a fixed interval of time or space. An example might make this clear. Consider the case of the number of calls received by a customer care center per hour. We can estimate the average number of calls per hour but we cannot determine the exact number and the exact time at which there is a call. Each occurrence of an event is independent of the other occurrences.

The PMF is given as,

where λ is the average number of times the event has occurred in a certain period of time, x is the desired outcome and e is the Euler’s number.

Multinomial Distribution

In the above distributions, there are only two possible outcomes – success and failure. The multinomial distribution, however, describes the random variables with many possible outcomes. This is also sometimes referred to as categorical distribution because it treats each possible outcome as a separate category. Consider the scenario of playing a game n number of times. Multinomial distribution calculates the combined probability of player wins across multiple trials.

The PMF is given as,

where n is the number of trials, p_1,……p_k denote the probabilities of the outcomes x₁……x_k respectively.

Discrete vs Continuous Distribution

Discrete Distribution	Continuous Distribution
Random variable can only take on a finite or countable number of values	Random variable can take on any value within a certain range or interval
Probability function assigns probabilities to each possible outcome	Probability function assigns probabilities to each possible value within a range or interval
Examples: binomial distribution, Poisson distribution, geometric distribution	Examples: normal distribution, exponential distribution, beta distribution
Used to model events with discrete outcomes, such as number of successes in a fixed number of trials	Used to model events with continuous outcomes, such as the height of individuals in a population
Probability of any single outcome is non-zero	Probability of any single value is zero
Cumulative distribution function is stepwise	Cumulative distribution function is continuous

Also Read: Understanding Random Variables their Distributions

Conclusion

Discrete probability distributions are essential tools for modeling and predicting outcomes of random events with discrete results, such as the number of customers visiting a store or the outcomes of coin tosses and dice rolls. This article has explored several key types of discrete probability distributions, including Bernoulli, Binomial, Hypergeometric, Negative Binomial, Geometric, Poisson, and Multinomial distributions, each suited for different scenarios in probability theory.

This article discusses concepts such as expected value, the sum of probabilities, fair coin experiments, fair dice outcomes, and uniform distribution to provide a comprehensive understanding. Mastering these foundational concepts is crucial as they enable informed decisions and predictions based on statistical data. This enhances data-driven decision-making capabilities across various fields, ensuring accurate analysis and interpretation of outcomes.

Frequently Asked Questions

Q1. What is discrete and continuous distribution?

A. Discrete distributions are probability distributions where a random variable can only take on finite or countable values. Continuous distributions allow the random variable to take on any value within a certain range.

Q2. What is the difference between discrete variables and continuous random variables?

A. Discrete variables are integers within a sample space, like the number of heads in a coin flip or event occurrences. Continuous random variables, like height, weight, or temperature, can take any numerical value within a range or interval, with probabilities defined over an interval.

Q3. How is a histogram used in understanding a probability distribution function?

A. A histogram is a visual representation of a dataset, used to estimate the probability distribution function of discrete and continuous random variables. It visualizes the frequency of data points within consecutive intervals, providing a visual representation of the data’s distribution, standard deviation, and central tendency.

Q4. What parameters define a binomial probability distribution, and how is the standard deviation calculated?

A. A binomial probability distribution is a statistical model that considers the number of trials and the probability of success on each trial. It is based on the fact that trials are independent and only two outcomes are possible. The standard deviation, calculated using a formula, measures the variation in actual outcomes from the expected number of successes.

The media shown in this article are not owned by Analytics Vidhya and is used at the Author’s discretion.

blogathon Discrete Probability Distributions

Priyanka

I am a former software engineer with 6 years of work experience. I am pursuing my Masters in Data Science student @ TU Dortmund. I write about my areas of interest regularly on LinkedIn and Medium. Follow me for more technical content.

Beginner Statistics

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

Reading list

Basics of Machine Learning

Machine Learning Lifecycle

Importance of Stats and EDA

Understanding Data

Probability

Exploring Continuous Variable

Exploring Categorical Variables

Missing Values and Outliers

Central Limit theorem

Bivariate Analysis Introduction

Continuous - Continuous Variables

Continuous Categorical

Categorical Categorical

Multivariate Analysis

Different tasks in Machine Learning

Build Your First Predictive Model

Evaluation Metrics

Preprocessing Data

Linear Models

KNN

Selecting the Right Model

Feature Selection Techniques

Decision Tree

Feature Engineering

Naive Bayes

Multiclass and Multilabel

Basics of Ensemble Techniques

Advance Ensemble Techniques

Hyperparameter Tuning

Support Vector Machine

Advance Dimensionality Reduction

Unsupervised Machine Learning Methods

Recommendation Engines

Improving ML models

Working with Large Datasets

Interpretability of Machine Learning Models

Interpretability of Machine Learning Models

Automated Machine Learning

Model Deployment

Deploying ML Models

Embedded Devices

Discrete Probability Distributions

Learning Outcomes

Table of contents

What is Discrete Probability Distributions?

Understanding Discrete Probability Distributions with an Example

Types of Probability Distributions

1. Discrete Probability Distributions

2. Continuous Probability Distributions

Terminologies

Cumulative Distribution Function

Discrete Probability Distributions

Types of Discrete Probability Distributions

Bernoulli Distribution

Binomial Distribution

Hypergeometric Distribution

Negative Binomial Distribution

Geometric Distribution

Poisson Distribution

Multinomial Distribution

Discrete vs Continuous Distribution

Conclusion

Frequently Asked Questions

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Congratulations, You Did It!

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)