Understand The Moment Generating Functions!

Naman Last Updated : 09 Nov, 2023

10 min read

This article was published as a part of the Data Science Blogathon

Introduction

The probability distribution of a random variable can be expressed in many ways- Probability Density Functions (PDF) (or equivalently Probability Mass Functions, PMF for discrete variables), Cumulative Distribution Functions, Joint Probability distributions (for the relationship between the distributions), etc. The graphs of these functions can be described qualitatively and quantitatively.

Qualitative descriptions involve describing the characteristics of the graph without using numerical features. E.g., saying that the graph is broad, noisy, smooth, etc.

Quantitative descriptions, on the other hand, use numerical characteristics called moments. There are various such moments in statistics such as expected value, variance, skewness, kurtosis, median, mode, covariance, correlation, hypertailedness et al.

The massive realm of statistical sciences includes the detailed study of all of these moments of distributions. However, most of the statistical studies rely mainly on two of them- expected value and variance. Since the variance of a random variable can be obtained using different powers of the expected value, we’ll focus on the latter.

The expected value (or the mean) of a random variable is its weighted average. Mathematically, it is shown as:

Moment Generating functions first moment

This is often referred to as the first moment of a random variable X. Just as before,
we can define the 2^nd moment of a random variable [E(X²)]:

Moment Generating functions | second moment

We follow a similar process to define the 3^rd, 4^th, 5^th, …, and eventually, the nth moment of a random variable [E(Xⁿ)] as follows:

Now suppose, we are given a normal distribution, and we are required to find its 1^st moment. Well, isn’t it simple? Just integrate x times the PDF over the entire range of the distribution. What about the 2^nd moment? Integrate x² times the PDF over the entire range of the distribution. And so on….

That’s a very tiring process, especially since it’s integration (or concurrently summation).

Is there an alternative method that can help us simplify things? Yes, here’s when Moment Generating Functions (MGFs) step in. Unlike the traditional method which relies on computing lengthy integrals, the method of MGFs eases the workload by relying on differentiation.

Besides, they have various other advantages, giving them a central role in statistical formulations. What are MGFs and how do they work? Let’s see!

Introduction
What are Moment Generating Functions (MGFs)?
Basic Properties of Moment Gathering Functions
MGFs of some Special Distributions
Conclusion
FAQs

What are Moment Generating Functions (MGFs)?

Think of moment generating functions as an alternative representation of the distribution of a random variable. Like PDFs & CDFs, if two random variables have the same MGFs, then their distributions are the same. Mathematically, an MGF of a random variable X is defined as follows:

A random variable X is said to have an MGF if:
1) M_x(t) exist for X.
2) The M_x(t) has a finite value for all t belonging to [-a, a], where a is any positive real number.

This may seem too much to digest at once. But we’ll understand it piece by piece.

First, what’s the t? Think of t as a constant whose value has got nothing to do with X. It allows the MGF to secretly encode the values of so many moments.

Second, how do we get the moments from the MGF? Just differentiate the MGF with respect to t, and let t=0! If you differentiate the MGF with respect to t once and substitute t=0, you’ll get the 1^st moment i.e., E(X). If you differentiate it once more, and now substitute t=0, you get the 2^nd moment i.e., E(X²). Likewise, if you differentiate the function n times and substitute t=0, you get the n^thmoment i.e., E(Xⁿ).

The key point here is that the substitution t=0 must be done only at the end i.e., after differentiating the MGF the required
number of times. Mathematically,

Third, how does this work? We’ll understand the derivation of the above equation very soon. But before that, we need to understand the meaning of E(e^tX). It’s just equal to the expression obtained by replacing the X in the first equation of this article with e^tX:

Now, that we know the basics, we’ll prove that differentiating the MGF n times and substituting t=0 gives E(Xⁿ). We’ll use the Maclaurin series for e^x as follows:

Moment Generating functions | maclaurin series

We’ll now apply expectation on both sides and use the properties of expectation:

Differentiating both sides of the equation will give us:

Differentaiating after applying expectation

At t=0, all the terms apart from E(X) get cancelled, giving:

This shows that the first derivative of the MGF at t=0 gives the 1^st moment of X. To prove that MGFs work for any nth moment in general, we differentiate E(e^tX) n times:

Moment Generating functions | differentiate etx n times

When we substitute t=0, all the higher powers of t cancel out, giving:

Some of us may have found it really hard to keep up with this derivation. Don’t worry! It’s enough if we just understand the essence i.e., how to use MGFs to find the different moments for different distributions. If you are really keen on understanding the derivation, read it once more and remember that:

1. The powers of X lesser than n (i.e., E(X¹), E(X²), …, E(X^(n-1))) are removed while differentiating the expression of e^tX n times (as they become constants).

2. The powers of X more than n (i.e., E(X⁽ⁿ⁺¹⁾), E(X⁽ⁿ⁺²⁾), …) are removed when we substitute t=0.

Thus, we’ll be left with E(Xⁿ), which proves our initial equation. Now we’ll understand a few basic properties of MGFs.

Basic Properties of Moment Gathering Functions

A) Moment Gathering Functions when a random variable undergoes a linear transformation:

Let X be a random variable whose MGF is known to be M_x(t). Suppose we have to find the MGF of a random variable Y, which is a linear transformation of X i.e., Y = αX + β. Then,

Moment Generating functions | linear transformation

Since e^t^β is constant, we can take it out of the expectation, giving us the following equation:

Finally, we once again remember that t is simply a constant whose value has no correlation with X. Thus, we can consider the entire product ‘tα’ as the new constant giving us:

B)Moment Gathering Functions of a linear combination of several independent random variables:

Let X₁, X₂, …, X_n be independent random variables whose MGFs are known to be M_x1(t), M_x2(t), …, M_xn(t). Suppose we have to find the MGF of a random variable Y, which is a linear combination of X₁, X₂, …, X_n i.e., Y = α₁X₁ + α₂X₂ + … + α_nX_n + β. Following the same procedure as above, we get:

Moment Generating functions | of linear combination

By the property of independence, we can separate the various terms:

More specifically, if Y is the sum of independent random variables, then the MGF of Y is the product of the MGFs of those random variable i.e., if Y = X₁ + X₂ + … + X_n

C) Case when the Moment Gathering Functions of two random variables are equal:

If X and Y are two random variables having the same MGF, thein their CDF is also the same i.e., their distributions are the same. Mathematically,

Moment Generating functions of two random variable are equal

MGFs of some Special Distributions

Here, we’ll calculate the MGF & use it to derive the first moment of certain special distributions- Bernoulli, Binomial, Exponential, and Normal distribution.

A) Bernoulli Distribution

Bernoulli distribution is a discrete distribution having two possible outcomes- 1 (success) with a probability p & 0 (failure) with probability (1-p). The PMF of a Bernoulli distribution is defined as:

The following plot shows a Bernoulli distribution (with parameter p):

Moment Generating functions | bernauli distribution parameter p

We’ll now derive its MGF as follows:

drive Moment Generating functions bernaulli

Calculating the first moment:

At t=0,

Thus, we have used MGF to obtain an expression for the first moment of a Bernoulli distribution.

B) Binomial Distribution

The binomial distribution is a sequence of several independent Bernoulli trials, with the probability of success p remaining constant for all the trials. In other words, the distribution of the sum of n i.i.d (independent & identically distributed) Bernoulli trials gives the binomial distribution:

The following plot shows a binomial distribution (with parameters p and n):

Moment Generating functions binaomial dist. parameter p and n

This time instead of using the PMF, we’ll use a shortcut- a property of MGFs. Recall that the MGF of the sum of several independent random variables is equal to the product of their MGFs:

property binomial dist Moment Generating functions

We’ll use this property here. Let Y be the random variable having binomial distribution, and Xs be the random variables having Bernoulli distribution. We’ll derive the MGF of Y as follows (using the fact that they have identical distributions, and consequently the same MGF):

Calculating the first moment:

At t=0,

Thus, we have used MGF to obtain an expression for the first moment of Binomial distribution.

C) Exponential Distribution

The PDF of an exponential distribution is defined as:

Moment Generating functions exponential distribution

The following plot shows an exponential distribution (with parameter λ):

We’ll now derive its MGF as
follows:

Drive Moment Generating functions exponential distribution

Calculating the first moment:

Moment Generating functions calculate first moment ED

At t=0,

Thus, we have used MGF to obtain an expression for the first moment of an Exponential distribution.

D) Normal Distribution

For the normal distribution, we’ll first discuss the case of standard normal, and then any normal distribution in general. A standard normal distribution has the mean equal to 0 and the variance equal to 1.

The PDF of a standard normal distribution is defined as:

pdf normal distribution Moment Generating functions

The following plot shows an exponential distribution (with parameters µ and σ):

We’ll now derive its MGF as follows:

drive Moment Generating functions normal dist

The integral corresponds to the PDF of a normal distribution with mean ‘t’, and variance 1. Thus, over the entire range of real interval, it integrates to 1 giving:

Now suppose we want to derive the MGF for any normal distribution in general with mean µ and variance σ^2.The PDF of such a distribution is shown as:

pdf Moment Generating functions normal distribution

However, instead of integrating the entire expression again, we can use one of the properties of MGFs. A random variable Y having a normal distribution with mean µ and variance σ² can be related to a random variable X,
having standard normal distribution as follows:

Using the property of linear transformation of a random variable,

linear transformation | Moment Generating functions normal distribution

Thus, after a long derivation, we’ve obtained the MGF of any normal distribution in general. Calculating the first moment:

Moment Generating functions | derivation ND

At t=0,

Thus, we have used MGF to obtain an expression for the first moment of a Normal distribution.

Conclusion

The concept of Moment Generating Functions has been thoroughly discussed in this article. The study of MGFs and their properties are very deep.

There are other concepts such as Jensen’s inequality, Chernoff bound, characteristic functions, etc.- all related to MGFs, which may be necessary for a statistician to know, but not so relevant for our study.

The media shown in this article are not owned by Analytics Vidhya and is used at the Author’s discretion.

FAQs

Q1. How moment generating function is used?

The moment generating function (MGF) uniquely identifies the probability distribution of a random variable. It allows for easy derivation of moments, calculating probabilities within a specific range, and studying the properties of probability distributions.

Q2. What is the moment generating function mgf of a Gaussian?

The moment generating function (MGF) of a Gaussian distribution with mean μ and variance σ^2 is: exp(μt + σ^2t^2/2)

Naman

Advanced Data Science Probability Research & Technology

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

Matthijs

Great! Finally an explanation that motivates the mathematics, instead of blindly giving definitions. Much appreciated! p.s.: You wrote 'gathering' instead of 'generating' a bunch of times in this article. You might want to edit that.

Jagadish

Hi Nimisha Agarwal, Thank you for wonderful article. Always wondered how "n" differentiated MGF at t=0 gives E[X^n]. After reading your article I understood that its about "Maclaurin series". Thanks Again! Also, can you suggest any book on "Mathematical statistics" which is not mathematically too rigorous?

umar saidu

Comment thank you very may Allah bless you.

Balal Ezanloo

Hi, good introduction. please correct Maclaurin series for ex

Reading list

Basics of Machine Learning

Machine Learning Lifecycle

Importance of Stats and EDA

Understanding Data

Probability

Exploring Continuous Variable

Exploring Categorical Variables

Missing Values and Outliers

Central Limit theorem

Bivariate Analysis Introduction

Continuous - Continuous Variables

Continuous Categorical

Categorical Categorical

Multivariate Analysis

Different tasks in Machine Learning

Build Your First Predictive Model

Evaluation Metrics

Preprocessing Data

Linear Models

KNN

Selecting the Right Model

Feature Selection Techniques

Decision Tree

Feature Engineering

Naive Bayes

Multiclass and Multilabel

Basics of Ensemble Techniques

Advance Ensemble Techniques

Hyperparameter Tuning

Support Vector Machine

Advance Dimensionality Reduction

Unsupervised Machine Learning Methods

Recommendation Engines

Improving ML models

Working with Large Datasets

Interpretability of Machine Learning Models

Interpretability of Machine Learning Models

Automated Machine Learning

Model Deployment

Deploying ML Models

Embedded Devices

Understand The Moment Generating Functions!

Introduction

Table of contents

What are Moment Generating Functions (MGFs)?

Basic Properties of Moment Gathering Functions

A) Moment Gathering Functions when a random variable undergoes a linear transformation:

B)Moment Gathering Functions of a linear combination of several independent random variables:

C) Case when the Moment Gathering Functions of two random variables are equal:

MGFs of some Special Distributions

A) Bernoulli Distribution

B) Binomial Distribution

C) Exponential Distribution

D) Normal Distribution

Conclusion

FAQs

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Congratulations, You Did It!

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM