Manoj GL — April 27, 2021
This article was published as a part of the Data Science Blogathon.

## Introduction

In business, there are some golden rules that everyone needs to appreciate and adopt.

Eighty percent of our business comes from 20% of our customers.
It costs 10 times less to sell to an existing customer than to find a new customer

The above golden rules define why we need Customer Lifetime value.

In our stock index, NIFTY 50 defines how our stock market is performing similarly, in business, it’s important to understand who your top customers are that give consistent and increasing revenue streams for your business.

One of the simple and effective methodologies which are generally used in calculating customer value over a time frame is RFM which is,

### Recency (R): How recently a customer has made a purchase Frequency (F): How often a customer makes a purchase Monetary Value (M): Dollar value of the purchases

We will go a little deeper into RFM analysis  in the following section:

Let’s take a small example where a bank wants to identify key customers for retention/development/acquisition.

So in the above scenario, we need to score each customer who had recent transactions with the bank on three important metrics mentioned above R, F & M.Then create a scoring methodology to segment the customer base and apply for different marketing programs.

## Process of RFM Analysis

Let’s calculate the RFM score for 5 sample  customers,

#### Step1: Derive R, F & M from the transactions of the bank from the last 1 year.

Preferably RFM is done for recent data and will be refreshed on a quarterly/half-yearly basis based on the business

Finding  R, F and M are pretty simple. Let’s say a customer deposited 10 K money on May 1st and deposited another 5 k on June 10th  and if you are doing RFM analysis on July 1st. Now for this customer, the Recency will be 1 month because the last transaction was in June and Frequency will be 2 because he made two deposits in May and June and M will be 15 K

#### Step 2: Derive score of each customer based on each parameter based on rank within the parameter

For Recency, smaller the better, because of the customer, we are on top of his mind and for Frequency & Monitory larger values are better

Let’s take the above table as an example, when compared to all customers,  recency is best for customer 3 as he is ranked as number 1, whereas for frequency he is in 4th position, and in terms of the value he is in 2nd position.

#### Step 3: Standardize score of each customer based on each parameter (0-100)

Standardize = current value/Max(Value) * 100

#### Step 4: Derive weighted score across each parameter for each customer

Consolidated Score = 0.15*R + 0.28*F + 0.57*M

Weights can be applied equally or we can provide specific weights for each parameter based on domain knowledge or business inputs. Here in the above case, we are giving more importance to Frequency and Monitory.

We just applied those weights to each customer.

For example,

Customer 4 value = 0.15 *40 + 0.28 *60 + 0.57*60 = 57

Then we segregated score as three segments,

• 0 – 50 – Low valued customer
• 50 – 75 – Medium valued customer
• 76 – 100 – High valued customer

Now based on the above scores, a business can apply the differentiation strategy like retention/development/acquisition of different customer segments

Further, most we can profile these segments with additional features like demographics,    spending pattern, and several products e.t.c; understand them a little deeper.

Now let’s try to implement this RFM analysis in python.

## RMF Analysis in Python

This is a Case study, where we are using a European retail chain data set.

Sample data is as follows:

For our RFM analysis, the key important features we will be using are InvoicDate, CustomerID and for sales, we are using Quantity and Unit Price

`# Import Packages`
```import numpy as np
import pandas as pd
import time, warnings
import datetime as dt
warnings.filterwarnings("ignore")
# Get the Data
# RFM Analysis
**RFM** (Recency, Frequency, Monetary) analysis first we need to create three features R , F & M from the data
lets create those features
## Recency
# To calculate recency, we need to find out  **when was the customer's most recent purchase.**.
# Create a new column called date which contains the date of invoice only
df['date'] = pd.DatetimeIndex(df['InvoiceDate']).date
# Group by customers and check last date of purchase
recency_df = df.groupby(by='CustomerID', as_index=False)['date'].max()
recency_df.columns = ['CustomerID','LastPurshaceDate']
# Calculate recent date to find recency wrt to this date
recent_date=recency_df.LastPurshaceDate.max()
print(recent_date)
# Calculate recency
recency_df['Recency'] = recency_df['LastPurshaceDate'].apply(lambda x: (recent_date - x).days)

Now in the same way we will calculate both Frequency and Monetary values

```# ## Frequency
# To calculate Frequency we need to check **How often a customer makes a purchase**.
# Drop duplicates
df1= df
df1.drop_duplicates(subset=['InvoiceNo', 'CustomerID'], keep="first", inplace=True)
# Calculate the frequency of purchases
frequency_df = df1.groupby(by=['CustomerID'], as_index=False)['InvoiceNo'].count()
frequency_df.columns = ['CustomerID','Frequency']
# ## Monetary
# To calculate Monetary value  **How much money did the customer spent during the timeframe?**
# Create column total cost
df['TotalCost'] = df['Quantity'] * df['UnitPrice']
monetary_df = df.groupby(by='CustomerID',as_index=False).agg({'TotalCost': 'sum'})
monetary_df.columns = ['CustomerID','Monetary']
# ## Create RFM Table
# Merge recency dataframe with frequency dataframe
temp_df = recency_df.merge(frequency_df,on='CustomerID')
# Merge with monetary dataframe to get a table with the 3 columns
rfm_df = temp_df.merge(monetary_df,on='CustomerID')
# Use CustomerID as index
rfm_df.set_index('CustomerID',inplace=True)
```# Rank each metric R , F & M
rfm_df['R_rank'] = rfm_df['Recency'].rank( ascending=False)
rfm_df['F_rank'] = rfm_df['Frequency'].rank(ascending=True)
rfm_df['M_rank'] = rfm_df['Monetary'].rank(ascending=True)
# normalize each rank with Max rank
rfm_df['R_rank_norm']=(rfm_df['R_rank']/rfm_df['R_rank'].max())*100
rfm_df['F_rank_norm']=(rfm_df['F_rank']/rfm_df['F_rank'].max())*100
rfm_df['M_rank_norm']=(rfm_df['F_rank']/rfm_df['M_rank'].max())*100
# Now apply our equation and create final score **Consolidated Score = 0.15*R_rank_norm + 0.28*F_rank_norm + 0.57M_rank_norm**
rfm_df['RFM_Score']=0.15*rfm_df['R_rank_norm']+0.28*rfm_df['F_rank_norm']+0.57*rfm_df['M_rank_norm']
rfm_df=rfm_df.round(0)

```# ## Customer segments with RFM Model
# # Segment customers based on RFM score
# 0 - 50 - Low valued customer
# 50 - 75 - Medium valued customer
# 76 - 100 - High valued customer
rfm_df["Customer_segment"]=np.where(rfm_df['RFM_Score'] > 75 ,"High Value Customer",(np.where(rfm_df['RFM_Score'] < 50 , "Low value Customer" ,"Medium Value Customer")))

Now that we knew our customer segments we can choose how to target or deal with each segment.
For example:

High Value Customer: They are your loyal customers.So provide constant support via customer service
Medium Value Customer: Send them personalized emails with offers and encourage them to shop more
Low Value Customer: These customers are about to churn or went into a dormant stage,apply reactivation strategies for them

In python and R we have multiple packages which support RFM analysis and there are multiple ways to derive an RFM score also. 