MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 months

Type: HTTP

Exploring Diffusion Models in NLP Beyond GANs and VAEs

Aadya Singh 20 Sep, 2023

9 min read

Introduction

Diffusion Models have gained significant attention recently, particularly in Natural Language Processing (NLP). Based on the concept of diffusing noise through data, these models have shown remarkable capabilities in various NLP tasks. In this article, we will delve deep into Diffusion Models, understand their underlying principles, and explore practical applications, advantages, computational considerations, relevance of Diffusion Models in multimodal data processing, availability of pre-trained Diffusion Models & challenges. We will also see code examples to demonstrate their effectiveness in real-world scenarios.

Learning Objectives

Understand the theoretical basis of Diffusion Models in stochastic processes and the role of noise in refining data.
Grasp the architecture of Diffusion Models, including the diffusion and generative processes, and how they iteratively improve data quality.
Gain practical knowledge of implementing Diffusion Models using deep learning frameworks like PyTorch.

This article was published as a part of the Data Science Blogathon.

Introduction
Understanding Diffusion Models
- Theoretical Foundation
Architecture of Diffusion Models
Practical Implementation
Applications in NLP
Advantages of Diffusion Models
Computational Considerations
Multimodal Data Processing
Pre-trained Diffusion Models
Ongoing Research and Open Challenges
Conclusion
Frequently Asked Questions

Understanding Diffusion Models

Researchers root Diffusion Models in the theory of stochastic processes and design them to capture the underlying data distribution by iteratively refining noisy data. The key idea is to start with a noisy version of the input data and gradually improve it over several steps, much like diffusion, where information spreads gradually through a medium.

This model iteratively transforms data to approach the true underlying data distribution by introducing and removing noise at each step. It can be thought of as a process similar to diffusion, where information spreads gradually through data.

In a Diffusion Model, there are typically two main processes:

Diffusion Process: This process involves iterative data refinement by adding noise. At each step, noise is introduced to the data, making it noisier. The model then aims to reduce this noise gradually to approach the true data distribution.
Generative Process: A generative process is applied after the data has undergone the diffusion process. This process generates new data samples based on the refined distribution, effectively producing high-quality samples.

The image below highlights differences in the working of different generative models.

Working of different Generative Models: https://lilianweng.github.io/posts/2021-07-11-diffusion-models/

Theoretical Foundation

1. Stochastic Processes:

Diffusion Models are built on the foundation of stochastic processes. A stochastic process is a mathematical concept describing random variables’ evolution over time or space. It models how a system changes over time in a probabilistic manner. In the case of Diffusion Models, this process involves iteratively refining data.

2. Noise:

At the heart of Diffusion Models lies the concept of noise. Noise refers to random variability or uncertainty in data. In the context of Diffusion Models, introduce the noise into the input data, creating a noisy version of the data.

Noise in this context refers to random fluctuations in the particle’s position. It represents the uncertainty in our measurements or the inherent randomness in the diffusion process itself. The noise can be modeled as a random variable sampled from a distribution. In the case of a simple diffusion process, it’s often modeled as Gaussian noise.

3. Markov Chain Monte Carlo (MCMC):

Diffusion Models often employ Markov Chain Monte Carlo (MCMC) methods. MCMC is a computational technique for sampling from probability distributions. In the context of Diffusion Models, it helps iteratively refine data by transitioning from one state to another while maintaining a connection to the underlying data distribution.

4. Example Case

In diffusion models, use stochasticity, Markov Chain Monte Carlo (MCMC), to simulate the random movement or spreading of particles, information, or other entities over time. Employ these concepts frequently in various scientific disciplines, including physics, biology, finance, and more. Here’s an example that combines these elements in a simple diffusion model:

Example: Diffusion of Particles in a Closed Container

Stochasticity

In a closed container, a group of particles moves randomly in three-dimensional space. Each particle undergoes random Brownian motion, which means a stochastic process governs its movement. We model this stochasticity using the following equations:

The position of particle i at time t+dt is given by:
x_i(t+dt) = x_i(t) + η * √(2 * D * dt)Where:
- x_i(t) is the current position of particle i at time t.
- η is a random number picked from a standard normal distribution (mean=0, variance=1) representing the stochasticity of the movement.
- D is the diffusion coefficient characterizing how fast the particles are spreading.
- dt is the time step.

MCMC

To simulate and study the diffusion of these particles, we can use a Markov Chain Monte Carlo (MCMC) approach. We’ll use a Metropolis-Hastings algorithm to generate a Markov chain of particle positions over time.

Initialize the positions of all particles randomly within the container.
For each time step t:
a. Propose a new set of positions by applying the stochastic update equation to each particle.
b. Calculate the change in energy (likelihood) associated with the new positions.
c. Accept or reject the proposed positions based on the Metropolis-Hastings acceptance criterion, considering the change in energy.
d. If accepted, update the positions; otherwise, keep the current positions.

Noise

In addition to the stochasticity in particle movement, there may be other noise sources in the system. For example, there could be measurement noise when tracking the positions of particles or environmental factors that introduce variability in the diffusion process.

To study the diffusion process in this model, you can analyze the resulting trajectories of the particles over time. The stochasticity, MCMC, and noise collectively contribute to the realism and complexity of the model, making it suitable for studying real-world phenomena like the diffusion of molecules in a fluid or the spread of information in a network.

Architecture of Diffusion Models

Diffusion Models typically consist of two fundamental processes:

1. Diffusion Process

The diffusion process is the iterative step where noise is added to the data at each step. This step allows the model to explore different variations of the data. The goal is to gradually reduce the noise and approach the true data distribution. Mathematically, it can be represented as :

x_t+1 = x_t + f(x_t, noise_t)

where:

x_t represents the data at step t.
noise_t is the noise added at step t.
f is a function that represents the transformation applied at each step.

2. Generative Process

The generative process is responsible for sampling data from the refined distribution. It helps in generating high-quality samples that closely resemble the true data distribution. Mathematically, it can be represented as:

x_t ~ p(x_t|noise_t)

where:

x_t represents the generated data at step t.
noise_t is the noise introduced at step t.
p represents the conditional probability distribution.

Practical Implementation

Implementing a Diffusion Model typically involves using deep learning frameworks like PyTorch or TensorFlow. Here’s a high-level overview of a simple implementation in PyTorch:

import torch
import torch.nn as nn

class DiffusionModel(nn.Module):
    def __init__(self, input_dim, hidden_dim, num_steps):
        super(DiffusionModel, self).__init__()
        self.num_steps = num_steps
        self.diffusion_transform = nn.ModuleList([nn.Linear(input_dim, hidden_dim) for _ in range(num_steps)])
        self.generative_transform = nn.ModuleList([nn.Linear(hidden_dim, input_dim) for _ in range(num_steps)])

    def forward(self, x, noise):
        for t in range(self.num_steps):
            x = x + self.diffusion_transform[t](noise)
            x = self.generative_transform[t](x)
        return x

In the above code, we defined a simple Diffusion Model with diffusion and generative transformations applied iteratively over a specified number of steps.

Applications in NLP

Text Denoising: Cleaning Noisy Text Data

Diffusion Models are highly effective in text-denoising tasks. They can take noisy text, which may include typos, grammatical errors, or other artifacts, and iteratively refine it to produce cleaner, more accurate text. This is particularly useful in tasks where data quality is crucial, such as machine translation and sentiment analysis.

Example of Text Denoising : https://pub.towardsai.net/cyclegan-as-a-denoising-engine-for-ocr-images-8d2a4988f769

Text Completion: Generating Missing Parts of Text

Text completion tasks involve filling in missing or incomplete text. Diffusion Models can be employed to iteratively generate the missing portions of text while maintaining coherence and context. This is valuable in auto-completion features, content generation, and data imputation.

Style Transfer: Changing Writing Style While Preserving Content

Style transfer is the process of changing the writing style of a given text while preserving its content. Diffusion Models can gradually morph the style of a text by refining it through diffusion and generative processes. This is beneficial for creative content generation, adapting content for different audiences, or transforming formal text into a more casual style.

Example of Style transfer : https://towardsdatascience.com/how-do-neural-style-transfers-work-b76de101eb3

Image-to-Text Generation: Generating Natural Language Descriptions for Images

In the context of image-to-text generation, use the diffusion models to generate natural language descriptions for images. They can refine and improve the quality of the generated descriptions step by step. This is valuable in applications like image captioning and accessibility for visually impaired individuals.Im

Example of Image to text generation using Generative Models : https://www.edge-ai-vision.com/2023/01/from-dall%C2%B7e-to-stable-diffusion-how-do-text-to-image-generation-models-work/

Advantages of Diffusion Models

How Diffusion Models Differ from Traditional Generative Models?

Diffusion Models differ from traditional generative models, such as GANs (Generative Adversarial Networks) and VAEs (Variational Autoencoders), in their approach. While GANs and VAEs directly generate data samples, Diffusion Models iteratively refine noisy data by adding noise at each step. This iterative process makes Diffusion Models particularly well-suited for data refinement and denoising tasks.

One of the primary advantages of Diffusion Models is their ability to effectively refine data by gradually reducing noise. They excel at tasks where clean data is essential, such as natural language understanding, where removing noise can improve model performance significantly. They are also beneficial in scenarios where data quality varies widely.

Computational Considerations

Resource Requirements for Training Diffusion Models

Training Diffusion Models can be computationally intensive, especially when dealing with large datasets and complex models. They often require substantial GPU resources and memory. Additionally, training over many refinement steps can increase the computational burden.

Challenges in Hyperparameter Tuning and Scalability

Hyperparameter tuning in Diffusion Models can be challenging due to the numerous parameters involved. Selecting the right learning rates, batch sizes, and the number of refinement steps is crucial for model convergence and performance. Moreover, scaling up Diffusion Models to handle massive datasets while maintaining training stability presents scalability challenges.

Multimodal Data Processing

Extending Diffusion Models to Handle Multiple Data Types

Diffusion Models do not limit themselves to processing single data types. Researchers can extend them to handle multimodal data, encompassing multiple data modalities such as text, images, and audio. Achieving this involves designing architectures that can simultaneously process and refine multiple data types.

Examples of Multimodal Applications

Multimodal applications of Diffusion Models include tasks like image captioning, processing visual and textual information, or speech recognition systems combining audio and text data. These models offer improved context understanding by considering multiple data sources.

Pre-trained Diffusion Models

Availability and Potential Use Cases in NLP

Pre-trained Diffusion Models are becoming available and can be fine-tuned for specific NLP tasks. This pre-training allows practitioners to leverage the knowledge captured by these models on large datasets, saving time and resources in task-specific training. They have the potential to improve the performance of various NLP applications.

Ongoing Research and Open Challenges

Current Areas of Research in Diffusion Models

Researchers are actively exploring various aspects of Diffusion Models, including model architectures, training techniques, and applications beyond NLP. Areas of interest include improving the scalability of training, enhancing generative processes, and exploring novel multimodal applications.

Challenges and Future Directions in the Field

Challenges in Diffusion Models include addressing the computational demands of training, making models more accessible, and refining their stability. Future directions involve developing more efficient training algorithms, extending their applicability to different domains, and further exploring the theoretical underpinnings of these models.

Conclusion

Researchers root Diffusion Models in stochastic processes, making them a powerful class of generative models. They offer a unique approach to modeling data by iteratively refining noisy input. Their applications span various domains, including natural language processing, image generation, and data denoising, making them a valuable addition to the toolkit of machine learning practitioners.

Key Takeaways

Diffusion Models in NLP iteratively refine data by applying diffusion and generative processes.
Diffusion Models find applications in NLP, image generation, and data denoising.

Frequently Asked Questions

Q1. What distinguishes Diffusion Models from traditional generative models like GANs and VAEs?

A1. Diffusion Models focus on refining data iteratively by adding noise, which differs from GANs and VAEs that generate data directly. This iterative process can result in high-quality samples and data-denoising capabilities.

Q2. Are Diffusion Models computationally expensive to train?

A2. Diffusion Models can be computationally intensive, especially with many refinement steps. Training may require substantial computational resources.

Q3. Can Diffusion Models handle multimodal data, such as text and images together?

A3. Extend the Diffusion Models to handle multimodal data by incorporating appropriate neural network architectures and handling multiple data modalities in the diffusion and generative processes.

Q4. Are there pre-trained Diffusion Models available for NLP tasks?

A4. Some pre-trained Diffusion Models are available, which can be fine-tuned for specific NLP tasks, similar to pre-trained language models like BERT and GPT.

Q5. What are some open challenges in the field of Diffusion Models?

A5. Challenges include selecting appropriate hyperparameters, dealing with large datasets efficiently, and exploring ways to make training more stable and scalable. Additionally, there’s ongoing research to improve the theoretical understanding of these models.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

Aadya Singh 20 Sep, 2023

Aadya Singh is a passionate and enthusiastic individual excited about sharing her knowledge and growing alongside the vibrant Analytics Vidhya Community. Armed with a Bachelor's degree in Bio-technology from MS Ramaiah Institute of Technology in Bangalore, India, she embarked on a journey that would lead her into the intriguing realms of Machine Learning (ML) and Natural Language Processing (NLP). Aadya's fascination with technology and its potential began with a profound curiosity about how computers can replicate human intelligence. This curiosity served as the catalyst for her exploration of the dynamic fields of ML and NLP, where she has since been captivated by the immense possibilities for creating intelligent systems. With her academic background in bio-technology, Aadya brings a unique perspective to the world of data science and artificial intelligence. Her interdisciplinary approach allows her to blend her scientific knowledge with the intricacies of ML and NLP, creating innovative and impactful solutions.

Deep Learning Intermediate NLP Probability PyTorch

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Learn Retrieval-Augmented Generation (RAG): learn how it works, the RAG framework, and use LlamaIndex for advanced systems.

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

_ga_#

_gat_#

collect

AEC

G_ENABLED_IDPS

test_cookie

Webengage (2)

_we_us

WebKlipperAuth

LinkedIn (16)

ln_or

JSESSIONID

li_rm

AnalyticsSyncHistory

lms_analytics

liap

visit

li_at

s_plt

lang

s_tp

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

s_pltp

s_tslv

li_theme

li_theme_set

Google (11)

_gcl_au

SID

SAPISID

__Secure-#

APISID

SSID

HSID

DV

NID

1P_JAR

OTZ

Facebook (2)

_fbp

fr

LinkedIn (6)

bscookie

lidc

bcookie

aam_uuid

UserMatchHistory

li_sugr

Microsoft (2)

MR

ANONCHK

Reading list

Introduction to Generative AI

Introduction to Generative AI applications

No-code Generative AI app development

Code-focused Generative AI App Development

Introduction to Responsible AI

LLMS

Prompt Engineering

Finetuning LLMs

Training LLMs from Scratch

Langchain

RAG