An analytics interview case study

Tavish Srivastava 26 Feb, 2019 • 8 min read

Introductionroad-confussion

Case study is the most important round for any analytics hiring. However, a lot of people feel nervous with the mention of undergoing a case interview. There are multiple reasons for this, but the popular ones are:

  • You need to think on your feet in a situation where there is already enough pressure
  • Limited resources available to prepare for analytical case studies. Even with the amount of content available on web, there aren’t many analytical case studies which are available freely.

From an interviewer perspective, he is judging the candidate on structured thinking, problem solving and comfort level with numbers using these case studies. This article will take you through a case study. Answer to each question takes you deeper into the same problem.

Make sure you check out the ‘Ace Data Science Interviews‘ course. We have poured all our combined experience of over a decade and hundreds of interviews into this comprehensive and ultimate course. It’s a guide you don’t want to miss!

 

[stextbox id=”section”]Background: [/stextbox]

 I moved to Bangalore 10 months back. Bangalore is a big city with number of roads tagged as one-way. You take a wrong turn and you are late by more than 20 minutes.  Every single day I compare the time taken on different routes and choose the best among all possible combinations. This article takes you through an interesting road puzzle which took me considerable time to crack.

[stextbox id=”section”]Process to solve: [/stextbox]

I have structured this in a fashion very similar to an analytics interview. You will be provided with background at start of the interview, which will be followed by questions. After you have brainstormed / solved a question, you will be presented with additional information which will progress the case further.

If you want to undergo this case in true spirit, just ask one of your friends to take the questions and information (provided in next section) and present them to you at the right time. After all the questions, I have provided asnwers which I expect. You can compare your answers to mine.

Please note that there is no right or wrong answer in many situation and a case evolves in the way the interviewer wants. If you have a different answer / approach, please feel free to post in comments and I would love to discuss them.

[stextbox id=”section”]Problem statement : [/stextbox]

Background : There are two alternate roads I take to hit the main road from my home. Average speed on each of the road comes out around 30 km/hr. Let’s call the two roads as road A and road B. Total distance one needs to travel on road A and road B is 1 km and 1.3 km respectively to hit the same point on the main road . Note that, before the two roads split, I see a signal (say Z)  which is common to both the roads and hence does not come in this calculation. See figure for clarifications.

roads

Q1 : What are the possible factors, I should consider to come up with the total time taken on each road?

Q2 : Which road should one take to reach  the main road so as to minimize the time taken? And what is the difference in total time taken by the two alternate routes?

Additional information (to be provided after question 2): Recently, one of the junction (say, X) on road A got too crowded and a traffic signal was installed on the same. The traffic signal was configured for 80 seconds red and 20 seconds green. Let’s denote the seconds of signal as R1 R2 R3 … G1 G2 G3 . Here, R1 denotes 1 sec after signal switched to red.

Q3 : Does it still makes sense to take road A, or to switch to road B provided the average speed on the road A is still the same except the halt at signal?

Additional information (to be provided after question 3):  If I reach the signal at R1, I will be in the front rows to be released once the signal turns green. Whereas, if I reach the signal at R80, I might have to wait for some time even after signal turns green because the vehicles in the front rows will block me for some seconds before I start. Let’s take some realistic guesses for the wait time after signal turns green.

R1 – R 10 : 0 sec , R11-R20 : 3 sec , R21 – R60 : 10 sec, R61 – R80 : 15 sec, G1-G15 : 5 sec, G15-G20 : 0 sec

Q4 : Does it still makes sense to take road A, or to switch to road B provided the average speed on the road A is still the same except the halt at signal?

Q5: Can you think of a reason, why road A can still be a better choice for reaching junction X in minimum time?

Additional information (to be provided after question 5): The signal Z (before the two roads split) has the exact same cycle as the signal at point X i.e. 90 sec red and 20 sec green. Average speed of any vehicle vary on road A from 25km/hr (heavy traffic) to 30km/hr (light traffic). The signal X is offset from signal Z by 25 seconds. Hence, when it turns green at Z, it is R55 at signal X.

Q6 : Does it still makes sense to take road A, or to switch to road B provided the average speed on the road A is still the same except the halt at signal?

[stextbox id=”section”]Solution  : [/stextbox]

Background : There are two alternate roads I take to hit the main road from my home. Average speed on each of the road comes out around 30 km/hr. Let’s call the two roads as road A and road B. Total distance one needs to travel on road A and road B is 1 km and 1.3 km respectively to hit the same point on the main road . Note that, before the two roads split, I see a signal (say Z)  which is common to both the roads and hence does not come in this calculation.

Question : Which road should one take to reach  the main road so as to minimize the time taken? And what is the difference in total time taken by the two alternate routes?

Solution : 

[stextbox id=”grey”]

Time taken on road A = 1/30 * 60 min = 2 minutes

Time taken on road B = 1.3/30 * 60 min = 2.6 minutes = 2 min 36 sec

Hence, the clear choice is road A. Road B would have taken 36 sec more than road A.

Interviewer tests your comfort with numbers and your confidence with the answer in this step.

[/stextbox]

Background : Recently, one of the junction (say, X) on road A got too crowded and a traffic signal was installed on the same. The traffic signal was configured for 80 seconds red and 20 seconds green. Let’s denote the seconds of signal as R1 R2 R3 … G1 G2 G3 . Here, R1 denotes 1sec after signal switched to red.

Question : Does it still makes sense to take road A, or to switch to road B provided the average speed on the road A is still the same except the halt at signal?

Solution : Let’s assume I come to the signal at a random time. Hence, probability of getting to the signal at R1 R2 R3 …or G1 G2 G3 are all equal. Hence, the expected time taken at the signal is :

[stextbox id=”grey”]

E(halt time) = (1+2+3+4+…….80)/(80+20) = (80*81)/(100*2) = 32.4 seconds.

Still we see 32.4 sec < 36 sec. Hence, it still made sense to take road A.

Interviewer tests your knowledge of statistics (Calculation of expected value) , approach to the problem and the interpretation of the final results in this step.

[/stextbox]

Background : Till this point, the solution will look good in books. Lets spice the problem up by ground realities. If I reach the signal at R1, I will be in the front rows to be released once the signal turns green. Whereas, if I reach the signal at R80, I might have to wait for some time even after signal turns green because the vehicles in the front rows will block me for some seconds before I start. Let’s take some realistic guesses for the wait time after signal turns green.

R1 – R 10 : 0 sec , R11-R20 : 3 sec , R21 – R60 : 10 sec, R61 – R80 : 15 sec, G1-G15 : 5 sec, G15-G20 : 0 sec

Question : Does it still makes sense to take road A, or to switch to road B provided the average speed on the road A is still the same except the halt at signal?

Solution :.

[stextbox id=”grey”]

E(halt time) = {(1+2+3+4+…….80) + 3*10 + 10*40 + 15*20 + 5*15}/(80+20) = 40.15 seconds.

This time the game changes and as 40.15 sec > 36 sec, I will prefer road B over road A.

Interviewer tests how well swiftly you change some of the assumption so as to minimize the added calculations.

[/stextbox]

Background : Even after making such logical calculation, I noted that in 30 different events, I was commuting more than 25 sec faster on road A compared to road B every single time. I did not change my average velocity on either of the roads. It could have been acceptable in case I found x number of event where A wins and 30 – x where B wins. But A winning every single time was fishy. I was struggling for last 10 days to figure out a valid cause. It struck me today and following is what I figured out:

The signal Z ( before the two roads split), which I initially though had nothing to do with the calculation was actually the game changer. Here is how it played a role.  This signal had the exact same cycle as the signal at point X i.e. 90 sec red and 20 sec green. Whenever, the two lights have the same cycle, the incidence on signal X is no longer random.

Question : Does it still makes sense to take road A, or to switch to road B provided the average speed on the road A is still the same except the halt at signal?

Solution : 

[stextbox id=”grey”]

Say, my average speed vary on road A from 25km/hr to 30km/hr. The signal X is offset from signal Z by 25 seconds. Hence, when it turns green at Z, it is R55 at signal X.

Case 1 : (Light traffic) Time taken to cover road A = 2 mins = 120 sec

Reading at X when I reach the signal = R55 + 120 = R75.

Case 2 : (Heavy traffic) Time taken to cover road A = 2 mins 24 sec = 144 sec

Reading at X when I reach the signal = R55 + 144 = G19

Hence, the probability of R1- R74 is zero. And the revised equation of expected time is :

E(halt time) = (5 + 4+ 3+ 2+ 1 + 15*5 + 5*15)/25 = 6.6 sec

Therefore, as 6.6 sec < 36 sec road A always wins on road B.

Thus, the assumption of random events is not always true. Try to figure out all possible factors that can possibly influence the happening of event before making random events assumption.

Interviewer tests your out of the box thinking, questioning your assumption skill and interpretation of results skill in this step.

[/stextbox]

[stextbox id=”section”]End Notes [/stextbox]

Did you find the article useful? Share with us any other problem statements you can think of. Do let us know your thoughts about this article in the box below.

In one of the upcoming articles, we will share how an interviewer judges an analyst during a case study.

 

If you like what you just read & want to continue your analytics learning, subscribe to our emails or like our facebook page.

Tavish Srivastava 26 Feb 2019

Tavish Srivastava, co-founder and Chief Strategy Officer of Analytics Vidhya, is an IIT Madras graduate and a passionate data-science professional with 8+ years of diverse experience in markets including the US, India and Singapore, domains including Digital Acquisitions, Customer Servicing and Customer Management, and industry including Retail Banking, Credit Cards and Insurance. He is fascinated by the idea of artificial intelligence inspired by human intelligence and enjoys every discussion, theory or even movie related to this idea.

Frequently Asked Questions

Lorem ipsum dolor sit amet, consectetur adipiscing elit,

Responses From Readers

Clear

Rajesh
Rajesh 06 Feb, 2014

Hi, For the first solution, should it be Time taken on road B = 1.5/30 * 60 min = 3 minutes rather than 1.3/30, Pls clarify. Thanks.

Tavish Srivastava
Tavish Srivastava 06 Feb, 2014

Rajesh, Thanks for pointing this out. There was a typo in the question. We have rectified the same. Tavish

Prateek
Prateek 06 Feb, 2014

In Solution 1;should't time taken by B should be equal to 1.5/30=3 mins?

Tavish Srivastava
Tavish Srivastava 06 Feb, 2014

Prateek, This has been changed. Thanks for pointing it out. Tavish

Avinash
Avinash 08 Jul, 2014

"probability of getting to the signal at R1 R2 R3 …or G1 G2 G3 are all equal" how can this be true when it shows red for 80 seconds and green for 20 seconds ??????can u please explain this?????? U can say either red or green thats why the probabilities are equal. but I feel when there is time factor attached to it , i feel probabilities might not be the same...... can u please throw some light on this?????

dewanshee
dewanshee 16 Sep, 2014

hey could you explain the light traffic case in the last case.. how did the reading at X come R75 by R55+ 120? and the similar equation in the case of heavy traffic

Vinay
Vinay 08 Dec, 2014

Tavish For question 3, I got an alternate solution. Correct me if I am wrong. Since the wait time is given in the question, that is (R1 – R 10 : 0 sec , R11-R20 : 3 sec , R21 – R60 : 10 sec, R61 – R80 : 15 sec, G1-G15 : 5 sec, G15-G20 : 0 sec), we can take the worst case among this. I mean, imagine that the driver comes between R61 - R80( waiting time is 15 sec). As a result, the total time taken would be 2min and an additional 15 sec( 2min 15 sec) , which is less than 2min 36 sec. So route A would be optimal. Thanks and Regards, Vinay

Karishma
Karishma 27 Aug, 2015

Hello, I would like to know if there are any books for practicing case studies for analytics interview. Similar to the case you have described above.

kirankuma
kirankuma 23 May, 2017

i wnat this answer? Sixty per cent of students applying for admissions at NGASCE are female. 30 applications were received on a particular day. What is the probability that exactly 15 of the applications will be from females? What is the probability that fewer than 10 of the applications will be from females? Also, calculate the expected number and variance of the number of applications from females?

bhanu
bhanu 09 Jul, 2017

Since the wait time is given in the question, that is (R1 – R 10 : 0 sec , R11-R20 : 3 sec only below these 3 cases road B is better because 1.R1 – R 10 : 0 sec =80 sec(red time)=120+80=200sec 2. R11-R20: 3sec=65 sec(red time)+3 sec=120+65+3=188 sec 3. R21 – R60 : 10 sec=40 sec(red time)+10sec=120+40+10=160 4. R61 – R80 : 15 sec=10 sec(red time)+15=120+10+15=145 sec

Debarpita Das Pal
Debarpita Das Pal 27 Jul, 2017

Q 4 : Does it still makes sense to take road A, or to switch to road B provided the average speed on the road A is still the same except the halt at signal? The calculated value is coming to 43.15 sec. Or am I doing something wrong?

Jeremy Boardman
Jeremy Boardman 22 May, 2018

On the time it takes to reach the traffic light on road A, it seems the calculation is not accounting for the fact that the traffic light is only 6/10's of the way down road A and not at the end of road A. Under heavy traffic, you would reach the intersection X after 86 seconds (.6km /25 km/hr * 60 mins/hr) not the 144 seconds as stated above. Of course, the wrinkle in complexity increases in that you would have the same reduced departure rate at the first light as you would under the second light. At this point, I think it is better to model each scenario and then calculate total travel time and then determine the average of those actual values across all scenarios (I did this quickly in excel). You could also break this down into a decision tree, which is how I think about traffic under these circumstances. Basically, you are given information about road conditions based on the number of cars stopped in front of you on at traffic light Z. If most go on road A, then you can expect a similar wait at the next light (assuming you have to stop). Essentially, you could delay your decision based on what you find at point Z. This should improve your travel time. My last consideration however, is that we are not really talking about much difference in time. We would also want to factor in the affect on mood from idle waiting. People probably prefer to drive further and longer if they are moving. There is something psychological about how we perceive time as wasted when we aren't moving. I'd be happier to have a predictable (less variance) but slightly longer commute rather than one that varies depending on traffic conditions and the added stress of timing lights. Probably my last consideration would be why on earth is this town favoring traffic moving across Road A 4 to 1 over the traffic flowing along Road A. The crossing road (of A) does not seem to go anywhere.. and if it did, wouldn't that traffic also naturally extend to interfere with Route B? I would petition the town to alter the timing of the light to be green 80% of the time along Road A and red only 20%. This would reduce the variance in my outcomes along A and make me happier!

Related Courses