Learn Analytics using a business case study : Part I
Best way to learn analytics is through experience and solving case studies. Here, I will present you a complete business model and take you through a step by step process of how analytics is set up in a new business, how is it used in daily processes and some of the advanced analytics techniques which a business can use to make meaningful segmentation and prediction to optimize its marketing & sales campaign.
[stextbox id=”section”] Background : [/stextbox]
You have recently started a Video CD rent shop. After 2 months you realize that there is tough competition in the market and you need to make a more customer centric strategy to stand out in the market. Hence, you want to collect the most granular details of your customer behavior and build strategy accordingly.
[stextbox id=”section”] Business case layout : [/stextbox]
This business case have been broken down into 3 articles. Following is a plot of the articles and each article will be strongly dependent on findings in the previous articles:
1. How do you collect data so as to capture all the important information?
2. Deep dive into customer behavior and using basic data analysis with business knowledge to optimize daily operations : Click here to directly move to this part https://www.analyticsvidhya.com/blog/2014/03/learn-analytics-business-case-study-part-ii/
3. How do you use data with advanced analytics to make your marketing/sales startegies more targeted?
[stextbox id=”section”] Case study part I : [/stextbox]
Did you ever wonder why do you deal with so many datamarts in your company. Let’s try to understand as the owner of the busienss what all data sources do you need.
1. Transactions Table :
You rent out Video CDs and the most important data for you will be transactional data. Transactional data is by far the richest data throughout all industries. Each row in transactional data corresponds to one transaction made. This transaction mostly are monetary transaction. To identify each transaction, you need a distinct transaction code associated with each transaction. What other fields can you think of to be captured along with each transaction. Following is a small list of such variables :
1. Transaction ID
2. Customer ID : Identifying the customer to whom you have rented out the CD
3. Rent due : How much does the customer need to pay as rent
4. Issue date : When was the movie rented out
5. Recieved date : When was the movie recieved. Blank if CD is still due
6. Movie ID : Identifying the movie
Following is a sample transactional data set :
2. Product Table :
If you have transaction table, you basically have the linkage between the customers and the products. But why does transaction table not have the discription of products? The simplest reason for the same is that total number of products are limited in any industry, and the same product is repeated throughout the transactions table. If we add description in every single line, it adds enormously to the overall size of transaction table, which anyway is huge. Hence, we keep the products table seperate and merge it with required transactions for specific analysis.
Product table is unique on product id, which maps to transactions table. What other parameters can you think of that make sense for you to include? Following is a list of possible variables :
1. Movie ID : Unique ID of movie
2. Origin yearmonth : When was the CD bought?
3. Genre of movie
4. Language of movie
5. Star Cast of movie
6. Movie name
Note : Product ID generally can be decoded to know product details. For example, here H denotes “Hindi” and E denotes “English”. This coding makes the analysis simpler.
3. Customer Table :
The other hand of transaction table is the customer table. Using the above two tables, you almost have everything except the details of the customer. While making any kind of customer centric strategy, its very essential to consider the customer profile.This table helps you find the customer profile. This table is unique on customer id. What other parameters can you think of that make sense for you to include? Following is a list of possible variables :
1. Customer ID : Unique ID of customer
4. Area : Area where the customer lives
5. Package : Package customer has enrolled to
6. Enrol_date : When did the customer make his first transaction
Note : Similar to Product ID, Customer ID also generally can be decoded to know customer details.
4. Engagement Tracker :
All the three tables together can be used to create any kind of analysis to build marketing and sales strategy. What they do not cover is the engagement you had with your customers till date. Say, I called Kunal 1 week back to tell him about a movie X. Now, it might not be the best idea to call Kunal again this week to tell about the same movie. Hence, we need to keep a track on all kinds of engagement we have with our customer on daily basis. This is similar to transactions table but this include all the non-monetary interactions we have with out customers till date. These interactions can be inbound or outbound. This table is unique on engagement_id. What other parameters can you think of that make sense for you to include? Following is a list of possible variables :
1. Engagement id : Unique engagement identification
2. Inbound flag : 1 if inbound, 0 if outbound
3. Type of engagement : Code for the type of engagement
4. Product ID : ID of product in question
5. Derived tables :
Because the data sizes become huge with time, it is always recommended to keep some monthly snapshots handy. One of such table can be transactions data rolled up at customer level. Following is a list of such possible variables :
1. Customer ID
2. # lifetime transactions : No. of transactions made by customer till date
3. Enrol date
4. # English Movies : No. of English movies taken by customer till date
5. Last engement date
6. Last transaction date
Such derived tables come very handy to make quick analysis. Say, you have acquired 10 new english movies and want to market them. You might want to market these movies to customers who watch english movies, who responded to recent engagements and who have done recent transactions. For such a targeting list, imagine the process you might need to follow. Following is a possible way to achieve the same :
Imagine how easy this analysis gets if you have the derived monthly snapshot handy.
[stextbox id=”section”] Graph schemas: [/stextbox]
The article till now focuses on use of traditional relational databases. Graph based databases (e.g. Neo4j) are a strong alternate to these traditional databases. They add a lot of flexibility to your database, where you can change the schema very easily.
This kind of flexibility is required in case your data formats can change and you can not have much control on it. Also, you can add new structures and relationships very quickly. Before we go in these details, a typical graph schema in this case would look something like:
Blue nodes represent customers, Red represent movies and Green represents various package available. Every edge is a relationship in between nodes. For example, if a customer rents out a movie, we can draw an edge between the 2.
Now by calculating things like number of edges from a node, you can look at things like most active customer, most rented and least rented movies. You can also start looking at what kind of customers are renting what kind of movies.
P.S. Like all data model designs, there are various alternates to this design and you should choose the best depending on your usage.
[stextbox id=”section”] End Notes : [/stextbox]
We discussed relational database and graph database for representing a typical business problem. The data tables we discussed in this article is almost parallel to datamarts in any industry. We will look at some interesting strategies which can be derived using these data sources for the CD rental business case. Some of these strategies which are very basic in nature and needs more of business sense than modelling will be discussed in the next article. This will make you understand how effective strategies can be built if you mix business knowledge with simple data analysis.Knowledge of data is very essential regardless of the industry you work for. To view the next part of this case study click https://www.analyticsvidhya.com/blog/2014/03/learn-analytics-business-case-study-part-ii/
Did you find the article useful? Share with us any other problem statements you can think of. Do let us know your thoughts about this article in the box below.
If you like what you just read & want to continue your analytics learning, subscribe to our emails, follow us on twitter or like our facebook page.
Next part of case study: Learn Analytics using a business case study: Part II