A Beginner’s Guide to Channel Attribution Modeling in Marketing (using Markov Chains, with a case study in R)

Last Updated : 19 Apr, 2023

9 min read

Introduction

In a typical ‘from think to buy’ customer journey, a customer goes through multiple touch points before zeroing in on the final product to buy. This is even more prominent in the case of e-commerce sales. It is relatively easier to track which are the different touch points the customer has encountered before making the final purchase.

Source: MarTech Today

As marketing moves more and more towards the consumer driven side of things, identifying the right channels to target customers has become critical for companies. This helps companies optimise their marketing spend and target the right customers in the right places.

More often than not, companies usually invest in the last channel which customers encounter before making the final purchase. However, this may not always be the right approach. There are multiple channels preceding that channel which eventually drive the customer conversion. The underlying concept to study this behavior is known as ‘multi-channel attribution modeling.’

In this article, we look at what channel attribution is and how it ties into the concept of Markov chains. We’ll also take a case study of an e-commerce company to understand how this concept works, both theoretically and practically (using R).

What is Channel Attribution?
- Markov Chains
- Removal Effect
Case Study of an E-Commerce Company
Implementation in R

What is Channel Attribution?

Google Analytics offers a standard set of rules for attribution modeling. As per Google, “An attribution model is the rule, or set of rules, that determines how credit for sales and conversions is assigned to touchpoints in conversion paths. For example, the Last Interaction model in Analytics assigns 100% credit to the final touchpoints (i.e., clicks) that immediately precede sales or conversions. In contrast, the First Interaction model assigns 100% credit to touchpoints that initiate conversion paths.”

We will see the last interaction model and first interaction model later in this article. Before that, let’s take a small example and understand channel attribution a little further. Let’s say we have a transition diagram as shown below:

In the above scenario, a customer can either start their journey through channel ‘C1’ or channel ‘C2’. The probability of starting with either C1 or C2 is 50% (or 0.5) each. Let’s calculate the overall probability of conversion first and then go further to see the effect of each of the channels.

P(conversion) = P(C1 -> C2 -> C3 -> Conversion) + P(C2 -> C3 -> Conversion)

= 0.5*0.5*1*0.6 + 0.5*1*0.6
= 0.15 + 0.3
= 0.45

Markov Chains

Markov chains is a process which maps the movement and gives a probability distribution, for moving from one state to another state. A Markov Chain is defined by three properties:

State space – set of all the states in which process could potentially exist
Transition operator –the probability of moving from one state to other state
Current state probability distribution – probability distribution of being in any one of the states at the start of the process

We know the stages through which we can pass, the probability of moving from each of the paths and we know the current state. This looks similar to Markov chains, doesn’t it?

Removal Effect

This is, in fact, an application of a Markov chains. We will come back to this later; let’s stick to our example for now. If we were to figure out what is the contribution of channel 1 in our customer’s journey from start to end conversion, we will use the principle of removal effect. Removal effect principle says that if we want to find the contribution of each channel in the customer journey, we can do so by removing each channel and see how many conversions are happening without that channel being in place.

For example, let’s assume we have to calculate the contribution of channel C1. We will remove the channel C1 from the model and see how many conversions are happening without C1 in the picture, viz-a-viz total conversion when all the channels are intact. Let’s calculate for channel C1:

P(Conversion after removing C1) = P(C2 -> C3 -> Convert)

= 0.5*1*0.6

= 0.3

30% customer interactions can be converted without channel C1 being in place; while with C1 intact, 45% interactions can be converted. So, the removal effect of C1 is

0.3/0.45 = 0.666.

The removal effect of C2 and C3 is 1 (you may try calculating it, but think intuitively. If we were to remove either C2 or C3, will we be able to complete any conversion?).

This is a very useful application of Markov chains. In the above case, all the channels – C1, C2, C3 (at different stages) – are called transition states; while the probability of moving from one channel to another channel is called transition probability.

Customer journey, which is a sequence of channels, can be considered as a chain in a directed Markov graph where each vertex is a state (channel/touch-point), and each edge represents transition probability of moving from one state to another. Since the probability of reaching a state depends only on the previous state, it can be considered as a memory-less Markov chain.

Case Study of an E-Commerce Company

Let’s take a real-life case study and see how we can implement channel attribution modeling.

An e-commerce company conducted a survey and collected data from its customers. This can be considered as representative population. In the survey, the company collected data about the various touch points where customers visit before finally purchasing the product on its website.

In total, there are 19 channels where customers can encounter the product or the product advertisement. After the 19 channels, there are three more cases:

#20 – customer has decided which device to buy;
#21 – customer has made the final purchase, and;
#22 – customer hasn’t decided yet.

The overall categories of channels are as below:

Category	Channel
Website (1,2,3)	Company’s website or competitor’s website
Research Reports (4,5,6,7,8)	Industry Advisory Research Reports
Online/Reviews (9,10)	Organic Searches, Forums
Price Comparison (11)	Aggregators
Friends (12,13)	Social Network
Expert (14)	Expert online or offline
Retail Stores (15,16,17)	Physical Stores
Misc. (18,19)	Others such as Promotional Campaigns at various location

Now, we need to help the e-commerce company in identifying the right strategy for investing in marketing channels. Which channels should be focused on? Which channels should the company invest in? We’ll figure this out using R in the following section.

Implementation using R

Let’s move ahead and try the implementation in R and check the results. You can download the dataset here and follow along as we go.

#Install the libraries
install.packages("ChannelAttribution")
install.packages("ggplot2")
install.packages("reshape")
install.packages("dplyr")
install.packages("plyr")
install.packages("reshape2")
install.packages("markovchain")
install.packages("plotly")

#Load the libraries
library("ChannelAttribution")
library("ggplot2")
library("reshape")
library("dplyr")
library("plyr")
library("reshape2")
library("markovchain")
library("plotly")

#Read the data into R
> channel = read.csv("Channel_attribution.csv", header = T)
> head(channel)

Output:

R05A.01	R05A.02	R05A.03	R05A.04	…..	R05A.18	R05A.19	R05A.20
16	4	3	5		NA	NA	NA
2	1	9	10		NA	NA	NA
9	13	20	16		NA	NA	NA
8	15	20	21		NA	NA	NA
16	9	13	20		NA	NA	NA
1	11	8	4		NA	NA	NA

We will do some data processing to bring it to a stage where we can use it as an input in the model. Then, we will identify which customer journeys have gone to the final conversion (in our case, all the journeys have reached final conversion state).

We will create a variable ‘path’ in a specific format which can be fed as an input to the model. Also, we will find out the total occurrences of each path using the ‘dplyr’ package.

> for(row in 1:nrow(channel))
{
  if(21 %in% channel[row,]){channel$convert[row] = 1}
}
> column = colnames(channel)
> channel$path = do.call(paste, c(channel[column], sep = " > "))
> head(channel$path)
[1] "16 > 4 > 3 > 5 > 10 > 8 > 6 > 8 > 13 > 20 > 21 > NA > NA > NA > NA > NA > NA > NA > NA > NA > NA > 1"     

[2] "2 > 1 > 9 > 10 > 1 > 4 > 3 > 21 > NA > NA > NA > NA > NA > NA > NA > NA > NA > NA > NA > NA > NA > 1"     

[3] "9 > 13 > 20 > 16 > 15 > 21 > NA > NA > NA > NA > NA > NA > NA > NA > NA > NA > NA > NA > NA > NA > NA > 1"

[4] "8 > 15 > 20 > 21 > NA > NA > NA > NA > NA > NA > NA > NA > NA > NA > NA > NA > NA > NA > NA > NA > NA > 1"

[5] "16 > 9 > 13 > 20 > 21 > NA > NA > NA > NA > NA > NA > NA > NA > NA > NA > NA > NA > NA > NA > NA > NA > 1"

[6] "1 > 11 > 8 > 4 > 9 > 21 > NA > NA > NA > NA > NA > NA > NA > NA > NA > NA > NA > NA > NA > NA > NA > 1"

> for(row in 1:nrow(channel))
{
  channel$path[row] = strsplit(channel$path[row], " > 21")[[1]][1]
}
> channel_fin = channel[,c(23,22)]
> channel_fin = ddply(channel_fin,~path,summarise, conversion= sum(convert))
> head(channel_fin)

Output:

path	conversion
1 > 1 > 1 > 20	1
1 > 1 > 12 > 12	1
1 > 1 > 14 > 13 > 12 > 20	1
1 > 1 > 3 > 13 > 3 > 20	1
1 > 1 > 3 > 17 > 17	1
> 1 > 6 > 1 > 12 > 20 > 12	1

> Data = channel_fin
> head(Data)

Output:

path	conversion
1 > 1 > 1 > 20	1
1 > 1 > 12 > 12	1
1 > 1 > 14 > 13 > 12 > 20	1
1 > 1 > 3 > 13 > 3 > 20	1
1 > 1 > 3 > 17 > 17	1
1 > 1 > 6 > 1 > 12 > 20 > 12	1

Now, we will create a heuristic model and a Markov model, combine the two, and then check the final results.

> H <- heuristic_models(Data, 'path', 'conversion', var_value='conversion')
> H

Output:

channel_name	first_touch_conversions	…..	linear_touch_conversions	linear_touch_value
1	130		73.773661	73.773661
20	0		473.998171	473.998171
12	75		76.127863	76.127863
14	34		56.335744	56.335744
13	320		204.039552	204.039552
3	168		117.609677	117.609677
17	31		76.583847	76.583847
6	50		54.707124	54.707124
8	56		53.677862	53.677862
10	547		211.822393	211.822393
11	66		107.109048	107.109048
16	111		156.049086	156.049086
2	199		94.111668	94.111668
4	231		250.784033	250.784033
7	26		33.435991	33.435991
5	62		74.900402	74.900402
9	250		194.07169	194.07169
15	22		65.159225	65.159225
18	4		5.026587	5.026587
19	10		12.676375	12.676375

> M <- markov_model(Data, 'path', 'conversion', var_value='conversion', order = 1)> M

Output:

channel_name	total_conversion	total_conversion_value
1	82.482961	82.482961
20	432.40615	432.40615
12	83.942587	83.942587
14	63.08676	63.08676
13	195.751556	195.751556
3	122.973752	122.973752
17	83.866724	83.866724
6	63.280828	63.280828
8	61.016115	61.016115
10	209.035208	209.035208
11	118.563707	118.563707
16	158.692238	158.692238
2	98.067199	98.067199
4	223.709091	223.709091
7	41.919248	41.919248
5	81.865473	81.865473
9	179.483376	179.483376
15	70.360777	70.360777
18	5.950827	5.950827
19	15.545424	15.545424

Before going further, let’s first understand what a few of the terms we’ve seen above mean.

First Touch Conversion: The conversion happening through the channel when that channel is the first touch point for a customer. 100% credit is given to the first touch point.

Last Touch Conversion: The conversion happening through the channel when that channel is the last touch point for a customer. 100% credit is given to the last touch point.

Linear Touch Conversion: All channels/touch points are given equal credit in the conversion.

Getting back to the R code, let’s merge the two models and represent the output in a visually appealing manner which is easier to understand.

# Merges the two data frames on the "channel_name" column.
R <- merge(H, M, by='channel_name')

# Select only relevant columns
R1 <- R[, (colnames(R) %in %c('channel_name', 'first_touch_conversions', 'last_touch_conversions', 'linear_touch_conversions', 'total_conversion'))]

# Transforms the dataset into a data frame that ggplot2 can use to plot the outcomes
R1 <- melt(R1, id='channel_name')

# Plot the total conversions
ggplot(R1, aes(channel_name, value, fill = variable)) +
  geom_bar(stat='identity', position='dodge') +
  ggtitle('TOTAL CONVERSIONS') +
  theme(axis.title.x = element_text(vjust = -2)) +
  theme(axis.title.y = element_text(vjust = +2)) +
  theme(title = element_text(size = 16)) +
  theme(plot.title=element_text(size = 20)) +
  ylab("")

The scenario is clearly visible from the above graph. From the first touch conversion perspective, channel 10, channel 13, channel 2, channel 4 and channel 9 are quite important; while from the last touch perspective, channel 20 is the most important (in our case, it should be because the customer has decided which product to buy). In terms of linear touch conversion, channel 20, channel 4 and channel 9 are coming out to be important. From the total conversions perspective, channel 10, 13, 20, 4 and 9 are quite important.

End Notes

In the above chart we have been able to figure out which are the important channels for us to focus on and which can be discarded or ignored. This case gives us a very good insight into the application of Markov chain models in the customer analytics space. E-commerce companies can now confidently create their marketing strategy and distribute their marketing budget using data driven insights.

Author Bio:

This article was contributed by Perceptive Analytics. Chaitanya Sagar, Prudhvi Potuganti and Saneesh Veetil developed this article.

Perceptive Analytics provides data analytics, data visualization, business intelligence and reporting services to e-commerce, retail, healthcare and pharmaceutical industries. Our client roster includes Fortune 500 and NYSE listed companies in the USA and India.

Business Analytics Business Intelligence Classification E-Commerce Graphs & Networks

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

Responses From Readers

Ram

Hi, Thank you for the article. Could you please provide the dataset Channel_attribution.csv

Show 1 reply

Pranav Dar

Hi Ram, The dataset has been provided under the 'Implementation using R' section. For your reference, you can download it from the below link directly: https://www.dropbox.com/s/wi907ms4h4cl1p0/Channel_attribution.csv?dl=0

Amogh

very nice article, where do we get the channel_Attribution.csv file

Show 1 reply

Pranav Dar

Hi Amogh, The dataset has been provided under the ‘Implementation using R’ section. For your reference, you can download it from the below link directly: https://www.dropbox.com/s/wi907ms4h4cl1p0/Channel_attribution.csv?dl=0

Bhaskar

Hi, Really insightful article; can you suggest a library or an implementation of similar channel attribution in python.

shivani munshi

Hi, Thank you for the article. Please provide the dataset Channel_attribution.csv

Show 1 reply

Pranav Dar

Hi Shivani, The dataset has been provided under the ‘Implementation using R’ section.

ken

Hi, Can you explain what "total conversion" from Markov model means? Also I think is good to understand how each channel contribute to the conversion, the follow up question will be - if investment of a channel has increased, how much impact will it do to the conversion; or given a fixed amount of budget, how to spend it in different channels to maximize the lift in conversions - Would love to see on how to answer these questions. An unrelated question to the analysis but related to data is how the e-commerce company able to get the required data for this analysis given many platform are not owned by the company and the company may have little visibility on the customer journey

ken

Hi Can you explain what does "total conversion" from Markov model means? Also would love to see from the model how to predict the impact of increase exposure of a channel (e.g. increase investment on the channel) and how to find out optimal spending in the channels given a fixed budget for maximal return Another difficulty in real world is, the company may not have all the data - e.g. how customers interact with channels not owned by the company; customers intention or whether they have decided to buy or not

Antonio

congratulations for the article. Only one thing is not clear to me, the values at the beginning of articles: = 0.5 * 0.5 * 1 * 0.6 + 0.5 * 1 * 0.6 how were they extracted?

Show 1 reply

Rudik

Antonio, take a look at Multiplication Rule Probability.

Arjit Kandpal

Hi Team, U guyz do an awesome job. For this post i cannot understand the coding part. Can u please help since I am a novice in R.

Fatih Yılmazer

Hi, very good article. Removal effect is calculated by 1-...

Ankit

Hi, I deleted some of the points with channel 21 in the dataset and ran the code. Now I have some 400 data points where conversion has not happened. Still my conversions and conversion value are equal although different the the ones mention here. Can you tell us why this might be happening. Shouldn't the value be different then the conversion?

Ulrich

Thanks for the nice article! I noticed a small typo in one R command, maybe you'd like to correct it. Correctly it should read as follows: # Select only relevant columns R1 <- R[, (colnames(R)%in%c("channel_name", "first_touch_conversions", "last_touch_conversions", "linear_touch_conversions", "total_conversion"))]

Joep van der Plas

Thanks for the great article. However, you made a mistake in computing the removal effect. The formula for the removal effect is (1 - conversion without channel i/ total conversion). Hence, the answer to your example has to be 1 - 0.3/0.45 = 0.333.

Craig

Hello, I tried the package ChannelAttribution. It seems that the results change every time I run it except when I set the seed. Can you please explain why this happens, i.e., what randomization happens internally in the model? Because as per my understanding, the transition probabilities should not change given that the data remains constant. Your views will be helpful. Thank you.

Is there a data prep R code that would create Channel_attribution.csv file from DCM logfiles with time stamps and DCM ids? Can you provide that?

Could you share a data preparation code that transforms DCM logfiles into Channel_attribution.csv using DCM ids and timestamps?

Erez

Thank you for a great tutorial and a wonderful website. Highly Appreciated

hamza

Removal effect = 1-(0.3/0.45) and not 0.3/0.45

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

Reading list

Basics of Machine Learning

Machine Learning Lifecycle

Importance of Stats and EDA

Understanding Data

Probability

Exploring Continuous Variable

Exploring Categorical Variables

Missing Values and Outliers

Central Limit theorem

Bivariate Analysis Introduction

Continuous - Continuous Variables

Continuous Categorical

Categorical Categorical

Multivariate Analysis

Different tasks in Machine Learning

Build Your First Predictive Model

Evaluation Metrics

Preprocessing Data

Linear Models

KNN

Selecting the Right Model

Feature Selection Techniques

Decision Tree

Feature Engineering

Naive Bayes

Multiclass and Multilabel

Basics of Ensemble Techniques

Advance Ensemble Techniques

Hyperparameter Tuning

Support Vector Machine

Advance Dimensionality Reduction

Unsupervised Machine Learning Methods

Recommendation Engines

Improving ML models

Working with Large Datasets

Interpretability of Machine Learning Models

Interpretability of Machine Learning Models

Automated Machine Learning

Model Deployment

Deploying ML Models

Embedded Devices

A Beginner’s Guide to Channel Attribution Modeling in Marketing (using Markov Chains, with a case study in R)

Introduction

Table of Contents

What is Channel Attribution?

Markov Chains

Removal Effect

Case Study of an E-Commerce Company

Implementation using R

End Notes

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

_ga_#

_gat_#

collect