Solving case studies is a great way to keep your grey cells active. You get to use math, logic and business understanding in order to solve questions. Do you find it exciting too ? Having received an overwhelming response on my last week’s case study, I thought the show must go on. Do check out the last week’s case study before solving this one.

This case study is one of my favorite because of its real life implementation. The calculations which you’ll do in solving this case are the ones which often take place in real life. Therefore, it’s not just mathematical but practical also. For experienced job roles, similar case studies often appears in job interviews also. So, give your best attempt!

The objective of this case study is to optimize the price level of products for an online vendor. I have over-simplified the case study to make it a similar platform for all, including people who have worked in similar industry. I’ve solved this case study in two ways, using **business approach** & **analytical approach **(using R). The data set is available for download below.

Are you preparing for your next data science interview? We have put together several such case studies as part of our comprehensive ‘Ace Data Science Interviews‘ course. Make sure you check it out!

Suppose, you are the owner of an online grocery store. You sell 250 products online. A conventional methodology has been applied to determine the price of each product. And, the methodology is very simple – price the product at par with the market price of the product.

You plan to leverage analytics to determine pricing to maximize the revenue earned.

Out of 100000 people who visit your website, only 5000 end up purchasing your products. Now, all those who made a purchase, you have obtained their buying patterns, including their average unit purchased etc.

To understand the impact of price variations, you tried testing different price points for each product. You got astonished by the results. The impact can be broken down into two aspects:

- A lower price point increases the volume of the purchased product.
- Customers compare price points of a few products more than others, to make a decision whether to buy products from your portal or not.

For instance, Product 1 might be a frequently used product. If you decrease the price point of product 1, then the customer response rate which was initially 5% goes up to 5.2% over and above the fact that the customer will purchase more of product 1.

On the other hand, decrease in product price obviously decreases the margin of the product.

Now, you want to find the optimum price points for each of the product to maximize the total profit earned. In this case study, you are provided with a table with all 250 items : Download Dataset

Following are the variables available in the data set:

**Average Price/Unit**: Market price of the product**Cost/Unit**: Current cost of the product**Average Profit/Unit**: Profit for each unit**Average units sold**: Average number of units of product sold to a customer who makes a purchase**Incremental acquisition**: For every 10% decline in unit price, this is the increase in total customer response rate. Note that overall response rate initially is 5% (5000 out of 100000 make a purchase). You are allowed to decrease the price of a product maximum by 10% by market laws.**Increase in sale volume**: For every 10% decline in unit price of product, this is the increase in volume. Again, you are allowed to decrease the price of a product maximum by 10% by market laws.

Note: The maximum price hike permitted is 20%. So, basically the price of a product can be varied between -10% to +20% around the average price/unit.

If you make the calculation of the profit earned per customer who comes to your portal :

**Total Profit Earned : $165**

We will try to solve this case study both by a **business approach** and an **analytical approach**. Let’s do it!

To solve the problem without applying any technique, let’s analyze the frequency distributions.

**Profit Margin** : Here is a distribution of profit margin for each of 250 products.

Let’s divide the products based on profit margin bands.

**Low Profit**** Margin**: Less than 10% Profit

**Medium Profit Margin***: 10% – 25% Profit*

**High Profit Margin***: 25% + Profit*

**Incremental volume** : Here is a distribution of incremental volume for each of 250 products.

Let’s categorize incremental volume bands:

**Low Incremental Volume***: Less than 2%*

**Medium Incremental Volume***: 2% – 6%*

**High Incremental Volume***: 6% + Profit*

**Incremental Acquisition** : Here is a distribution of incremental acquisition for each of 250 products.

Finally, we should split incremental acquisition bands also:

**Low Incremental Acquisition***: Less than 0.1% Profit*

**Medium Incremental Acquisition***: 0.1% – 0.4% Profit*

**High Incremental Acquisition***: 0.4% + Profit*

Let’s discuss the **pricing strategy** now:

Until here, we have got three decision attributes for each product. The extreme of the pricing is -10% and 20%. We need a -10% for products which can give a big acquisition increment or high volume increment or both. For rest we need to increase the profit margin by increasing pricing by +20%.

Obviously, this is a super simplistic way to solve this problem, but we are on a track of getting low hanging fruits benefit.

Because the incremental acquisition has a mean at 0.4%,we know that if we decrease the cost of products with high acquisition incremental rate, we will have significant incremental overall sales, but quite less impact due to less profit margins.

For medium incremental acquisitions, we need to take decision on profit margins and incremental volumes. All the cells shaded in green are the ones that will be tagged for -10% and rest at 20% price increase. The decision here is purely intuitive taking into account the basic understanding of revenue drivers for which we have seen the distributions above.

Now if we calculate the total profit earned, we see significant gains over and above the initial profit.

**Total Profit earned : $267 (Increase of 62%)**

Let’s take a more analytical/numerical way to solve the same problem. Here is what we try in this algorithm :

- We start with a vector of all zeros for each 250 products incremental sales price.
- We now increase/decrease each of the product’s incremental sales price by 1% to see the impact on total profit.
- For an increase in profit, we keep the incremental sales. Now we start with this vector to find further marginal increment.

Here is the R code:

setwd("C:\\Users\\ts93856\\Desktop\\Taxi") data <- read.csv("Bigbasket.csv") dim(data) #250*7 summary(data) # This is the initial value of the increment vector - all zeros increment <- array(0,250) flag = 0 increment_i <- increment #flag = 1 is a condition when the increment vector remains the same while (flag == 0) { print(find_rev(increment_i)) increment_iplus1 <- incremental_new(increment_i) if (min(increment_iplus1 == increment_i) == 1) {flag = 1} increment_i <- increment_iplus1 } increment_i find_rev(increment_i) /find_rev(increment) price <- increment_i write.csv(price,"price.csv") #This function tries to get the next best increment vector incremental_new <- function(initial_increments){ initial_rev <- find_rev(initial_increments) intermediate_rev <- 0 for(i in 1:250){ increments <- initial_increments if(increments[i] > -0.099) {increments[i] <- increments[i] - 0.01} rev <- find_rev(increments) if (rev > initial_rev) {final_increments <- increments intermediate_rev <- rev } if(increments[i] < 0.19) {increments[i] <- initial_increments[i] + 0.01} rev <- find_rev(increments) if (rev > max(initial_rev,intermediate_rev)) {final_increments <- increments} } return(final_increments) } # This function will get us the overall revenue for the given increment vector find_rev <- function(increment){ price <- data$Avg_Price_per_unit*(1+increment) volumes <- data$Average_units_sold*(1-(data$Increase_sale_volume*increment*10)) multiplier <- (1-(data$Incremental_acquisition*increment*10)) total_multiplier <- prod(multiplier) profit_wo_multiplier <- 0.05*(sum(price*volumes) - sum(volumes*data$Cost_per_unit)) profit_w_multiplier <- profit_wo_multiplier*total_multiplier net_profit <- sum(profit_w_multiplier) return(net_profit)}

**Total Profit earned : $310 (88% incremental benefit)**

2 minutes silence for all those who questioned the power of data analytics! 😀

The data used in this analysis is derived from a simulation of prices of a grocery store. The best approach is generally a combination of both business and numerical method. I encourage you to try such algorithms on this data and share with us your approach and total profit incremental benefit.

Did you like reading this article ? Do share your experience / suggestions in the comments section below. I’d love to know your intuitive solutions to learn more ways of solving this case.

Lorem ipsum dolor sit amet, consectetur adipiscing elit,

Hi Tavish, Thanks for the share. Just a quick clarification, "Average units sold : Average number of units of product sold to a customer who makes a purchase" Shouldn't it be "Average number of units of product sold to a customer who makes "ANY" purchase" ? Especially when the Average units sold numbers are less than 1..!! Thanks.

Yes you are right. I am referring "a purchase" as "any purchase"

Hi Tavish, long time :). Fantastic job with the analytical part, From what I understood from the above case, we have assumed no correlation between products when varying prices,right ? We have basically calculated the ideal price point for each product and then finally calculated the aggregated net profit ? How would you approach this if products are correlated among themselves, with price changes in one changing the dynamics of other products ?

Yes Abhinav, Assuming no correlation is over simplistic. In case we bring in the dimensionality of correlation between products, my R code will not change significantly. The only piece that will change is the last function where we calculate the incremental revenue.

Looks like you have implemented Genetic Algorithm to identify best set of Values, Am I right ..?

I will prefer saying Newton Raphson method rather than genetic algorithm.

Really mature , useful write up, thanks

Hello, Thanks for the article. What is the origin of the find_rev function ? Thank you, Ron

Sorry Ron, I am not sure what your Question is?

Hi. This is interesting. What is incremental acquisition? If the response rate increases the total sale volume increases . Would the response rate also affect total profit?

Yes the response rate will increase the total profit, as you have more number of customers buying from you.

Hello, I would like understand the both variables: "Incremental acquisition" and "Increase in sale volume": Can I calculate it w/ formulas (e.g. in MS Excel) or are they given? Thanks in advance.

Nice one, Tavish. Business vs Analytical approach can also be projected as "Excel vs R" comparison. I know that a lot of business users like Excel (me too!) but it is important to understand that Excel cannot model all complexities in data like what a good analytical algorithm can do. So it is all about exploiting the power of different techniques for the business problem at hand. And I liked the '2 Minutes silence' stuff! :-)

Well said

I am new to analytics. What is the net profit with present price ,average profit and the no of units sold? I have multiplied average profit per unit and average no of units sold for each product and summed up and got a result 3295.27. Where did i go wrong?

You might be missing out on the 5% response rate. Multiply your summed revenue by 5% and you will get $165.

I appreciate your thinking and the effort you’ve put into this problem. However, there are so many aspects missing. First off, the dataset you started with is not practical to obtain or create in real world. The online store should have done enough experimentation with the prices of each product for a sufficient period of time to arrive at these approximations. Anyway, for the sake of study, let's assume we somehow got this data. Secondly, the sales volume (and profit) need not necessarily increase with decreasing price. In an online store there are enough veblen and giffen type of goods. Imagine the case of iPhone. People will stop valuing it as a high end product and sales will eventually drop if apple reduced the prices. Incremental volume is typically a non-linear function with price. Also, Beyond a certain cutoff of price reduction, the sales will drastically drop. So, taking average seldom makes sense. This relationship between price and demand should ideally be modelled for each individual product. This is a problem of price demand elasticity. Overall, the problem needs to be handled in a completely different way, if someone intends to use it in production. The price demand elasticity metric has to be computed first and then proceed into detailed modelling from there. Needless to say, I like the good work you do at AV. Keep it up.

Hi Selva, Appreciate you putting your time to analyze the case study. You have some very valid points. Let me try to address them to the extent I can. Again, I accept this is an over-simplistic example of price optimization and real scenarios will be far more complex. First, such data generation is actually possible if we take a design of experiment kind of approach. I have implemented such models in past to find out price sensitivity. Obviously, the relationship is far more complex than linear, but for a simple case study - I guess it is fair to assume a linear relationship. Second, even though, as you rightly pointed out, the decrease in price might not always increase volume, I would argue that the merchant in question is a grocery store. It will be very hard to find items such as an Iphone in grocery store which have a loyalty price associated with it. Also, you might say that a low price might mean sub-standard grocery, for which we have anyway put cut-off on lower limit of price point. Third, you are right in pointing out there will be a cutoff of price point after which decrease or increase in price will not have any effect on anything. We have already taken into account a max and min i.e. -10% and 20%, which takes care of this fact to some extent. Lastly, I agree with your last point that pricing models are much more complex than this simplistic scenario. I hope I addressed your concerns to some extent. Thanks, Tavish

Great article! What would be your way if Incremental acquisition and Increase in sale volume wouldn't exists?

That will be a more realistice scenario. In such cases we use pricing models which are far more complex than this analysis.

How have you calculated "Total Profit Earned"? I am confused as to how the values of $165, $267 & $310 have been reached.

I also feel like price sensitivity per product is missing. For example: If I find out that product 1 is most sensitive to incremental acquisition then if I want to decide IT'S pricing I would need information on various price point for THIS PARTICULAR product. I cannot possibly decide optimal pricing for product 1 based on pricing values of other products primarily because not even industries are specified in this data set. Would that not be so?

I am not sure if I got your point completely. The pricing in this approach of a product is solely dependent on the sensitivity of incremental acquisition and volumes of the same product. The incremental acquisition of any product just influences the number of customer buying from this store. Again, I accept this is an over-simplistic scenario for price optimization but definitely a good starting point.

I have translated your R-script into Python but I still have a bug in my script (I am a Python beginner). Perhaps someone here could provide a accurate version: # -*- coding: utf-8 -*- import os import pandas as pd import numpy as np os.chdir('C:\\Users\\Downloads') df_raw = pd.read_csv('Vendor_Data.csv') #print(df_raw.head()) #print(df_raw.describe()) print('Matrix:',df_raw.shape,'\n') increment = np.zeros(df_raw.shape[0], dtype=np.float64) #print(increment) # This function will get us the overall revenue for the given increment vector def find_rev(increment): print('find rev function!!!', round(increment[1],5)) price = df_raw["Avg_Price_per_unit"] * (1 + increment) volumes = df_raw["Average_units_sold"] * (1 - (df_raw["Increase_sale_volume"] * increment * 10)) multiplier = 1 - (df_raw["Incremental_acquisition"] * increment * 10) total_multiplier = np.prod(multiplier) profit_wo_multiplier = 0.05 * (np.sum(price*volumes) - np.sum(volumes * df_raw["Cost_per_unit"])) profit_w_multiplier = total_multiplier * profit_wo_multiplier net_profit = np.sum(profit_w_multiplier) return net_profit #This function tries to get the next best increment vector def incremental_new(initial_increments): initial_rev = find_rev(initial_increments) intermediate_rev = 0 for i in range(0,250): print('CURRENT I', i, increments) increments = initial_increments print('increments[i]',increments[i],'initial_increments[i]',initial_increments[i]) if increments[i] > -0.099: print("yes1") increments[i] = (increments[i] - 0.011) #0.01 print("new i1", increments[i]) rev = find_rev(increments) #print('rev',rev) #final_increments = 0 if rev > initial_rev: final_increments = increments intermediate_rev = rev if increments[i] np.max(initial_rev) or rev > np.max(intermediate_rev): final_increments = increments print('final',final_increments) return final_increments flag = 0 increment_i = increment #flag = 1 is a condition when the increment vector remains the same while flag == 0: print('Initial overall profit:',find_rev(increment_i)) increment_iplus1 = incremental_new(increment_i) if min(increment_iplus1) == min(increment_i): if min(increment_iplus1) == 1: flag = 1 increment_i = increment_iplus1 print('Product changes', increment_i) print('New net profit',find_rev(increment_i)) print('Old net profit',find_rev(increment)) price = increment_i # optional save price as excel file

What is the error.

I was running the above code and i got the following error > # This is the initial value of the increment vector - all zeros > increment flag = 0 > increment_i #flag = 1 is a condition when the increment vector remains the same > while (flag == 0) { + print(find_rev(increment_i)) + increment_iplus1 <- incremental_new(increment_i) + if (min(increment_iplus1 == increment_i) == 1) {flag = 1} + increment_i

But for me i am still getting the same error. Is it that i am missing some libraries..?

The Error is : > while (flag == 0) { + print(find_rev(increment_i)) + increment_iplus1 <- incremental_new(increment_i) + if (min(increment_iplus1 == increment_i) == 1) {flag = 1} + increment_i <- increment_iplus1 + } Error in print(find_rev(increment_i)) : could not find function "find_rev"

You should run the last part of the code, which is the "find_rev" function.

Hi! I was wondering how I could get the Incremental acquisition and the Increase in sale volume of my ecommerce shop. If someone has an idea I would appreciate it. Thanks.

Hi Tavish, I don't know R, I tried reading the code but it does not seems intuitive enough to understand. I would be grateful if you can please explain the logic of the code. I also do not understand the difference between incremental volume and incremental acquisition, can you please explain. Regards, Harneet.

Hi Tavish, it's good article.my question and doubt,can we don't have any existing package to deal with such optimization problems as it take lot of coding to implement every time? it could be great if you substantiate logic behind this code volumes <- data$Average_units_sold*(1-(data$Increase_sale_volume*increment*10))

Hi , I wanted know how to calculate Incremental Acquisition ?

Hi Tavish, Great Article..!! Could you please throw some light on how the Incremental acquisition and the Increase in sales volume are calculated ? Thanks and Regards

HI , As I dont know R can someone guide me how to solve this on sas.

Hi Tavish, Am a newbie to Data Analytics, can you please explain how the net profit earned is calculated.Appreciate your response quickly. Thanks Kirthy

First of all: really great article with substantial value to it. Thanks. Interesting to see that so many questions revolve around availability / measurement of the input data. Changes in aquisition and sales volume need to be measured on your website - you need to produce these data by changing prices and measuring customer responses to these price changes. This is a whole different beast and includes valid experimental design and data engineering --- Or estimation / modelling of these variables based on other, available data. If anyone likes to discuss pricing analytics on a deep level please let me know.