After a wait of 3 long hours, it was my turn to enter the interview room. The first question asked to me by the interviewer was “Can you estimate the total number of cigarettes consumed per month in India?”. Having worked on a project for ITC in one of the core courses, I was able to crack the problem with relative ease. I started with the total number of factories of ITC in India. From there, I calculated the number of cigarettes manufactured by ITC in a year with the help of average turnover time. Further, I made good guesses on the % of cigarettes exported and the % share of ITC in India. Finally, I got the number of cigarettes consumed per month in India which convinced the panel.

Questions like these are very common in analytics and management consulting interviews. If you wish to appear for companies of this genre, you should be able to solve guess estimates (or guesstimates as we’ll call them from here on) in double-quick time. And hence this article will be very useful. I was fortunate to have got this puzzle. What if I had no clue on the number of ITC factories producing cigarettes?

After this interview, I tried solving many such puzzles to get a comfort level with such problems. In this article, I will walk through some techniques I now use to crack such puzzles.

*Guesstimates are one part of the entire data science interview process. We have penned down a comprehensive 7-step framework just for you, in our ‘Ace Data Science Interviews‘ course. Come and learn the various aspects, tips and tricks to crack your next data science interview!*

## What does an interviewer evaluate using a guesstimate case study?

Very often in the role of an Analyst or Consultant, clients expect quick initial scaling or sizing of potential projects. This is the reason such questions are so common in interviews for recruitment of such roles. The interviewer is looking out for four key traits in this interview.

- How structured is your approach?
- How comfortable are you with numbers?
- Are you able to make quick checks on the efficiency of different methods?
- Can you do back of the mind calculations and validate the magnitude of numbers?

## Framework to solve a guesstimate problem

Knowledge of certain techniques used for such guess estimates helps keeping the approach structured in the interview. Let’s address the cigarette estimate problem from the demand side (without using the number of ITC factories) while discussing the key techniques. Following are the 4 key techniques which will help you in such case interviews :

**Find the right proxy**: This is by far the most important technique. The proxy is a parameter which behaves in a similar manner as the dependent parameter. In the cigarette estimation problem, the population of India is a good proxy for the number of cigarette consumed monthly in India. If the population of India increases, it can be safely said that cigarette consumption will increase proportionally. Other proxies used is the growth in population, growth in demand of a newly introduced technology, average number of planes parked at major airports etc.**Segment till you can find differentiated clusters**: Estimating parameters on a segment level is far more accurate than making guesses on the overall population. In the cigarette estimation problem, population below 16 years can safely be ignored for cigarette consumption and female population is expected to have a lower average cigarette consumption than male population. This is how segmentation helps making accurate assumptions.**o smart calculations and number round off**: Speed is very critical in such problems and one needs to maintain a balance between accuracy and time consumption. Say you need to fin 2999/3. It is much easier to calculate 3000/3 than 2999/3. In such cases right the answer as 1000 (-) . This indicates the number is slightly lesser than 1000 and can be compensated in further calculations.**Validate number magnitude**: It is always a good idea to keep on validating intermediate numbers using your experience and sense checks.

## Some ground rules to be followed while doing a guesstimate

Following are some factors one should keep in mind while solving a guess estimate problem :

- Analyze all possible uses of the subject. For example, while estimating the number of tennis balls in India, one should consider balls being used in tennis, cricket and all other sports which are potential users of tennis balls.
- Keep population of your country, state and city on finger tips. As population is the most common proxy for many case studies, such numbers give a good starting point.
- Have a look on some key parameters for airline management : Many of guess estimate problems are related to airlines. A sense on the number of flights which normally stays in major airports, time lag between flight take off etc. helps.
- Draw neat diagrams to show the segmentation. This not only helps do calculations quickly but also makes it easier to redo the calculations on the segment level if required.
- Don’t do round off in the same direction. Such round off magnifies the error term. Putting a sign in front of rounded off number helps.

## Step-by step-approach for solving a guesstimate problem

### Case 1: Estimate the number of cigarettes consumed monthly in India

**Solution:** A good proxy in such a problem is the population of India, i.e., 1.2 billion. Following is an effective way to segment this population:

Following were the key considerations in building the segmentation and the intermediate guesses:

- The rural population consumes far lesser cigarettes than urban because of the purchasing power difference.
- Male consume more cigarettes than female in both urban and rural populations.
- Children below 16 years consume a negligible number of cigarettes.
- Male to Female ratio in Urban is closer to 1 than that of Rural.
- Male to Female ratio in younger generations is closer to 1 than that of older. This is because of the increase in awareness level.
- Bulk of population start smoking after getting into a job and hence the average number cigarettes are higher in older groups.
- Total number of cigarettes from the supply side also come to around 10 Trillion, which gives a good sense check on the final number.

### Case 2: Estimate the number of WhatsApp Android applications installed

**Solution:** A good proxy in this problem is the world population, i.e., ~7.2 Billion. Following is a possible approach to this problem:

The actual number of Whatsapp installed on Android phone is slightly more than 100 Million. As can be seen from this example that guess estimates can be fairly accurate if we choose good segments and approximations.

### Case 3: Estimate the number of tennis balls bought in India per month

**Solution:** A good proxy in this problem is the number of cities in India i.e. ~1700. The catch in this problem is to analyze where all can we use tennis balls. Once we have the number of tennis balls used monthly, we can easily find the number of tennis ball bought in a month using the lifetime of tennis balls.

Following is an effective way to segment this population:

Following were the key considerations in building the segmentation and the intermediate guesses:

- Rural areas have negligible number of tennis courts.
- Metro cities have the highest number of sectors.
- For each sectors in metro cities, the number of grounds for both tennis and cricket is higher. This is both because of the bigger area and the higher buying capacity in metros.
- Number of balls consumed in metros per ground is higher because of the higher engagement in metros.

## A challenge for the reader

Here is a practical example you can give a shot. Imagine you sitting in an interview and the interviewer asks “Estimate the number of aircrafts in air across the globe at this moment in time.” How will you answer this question ? Write down your approach in the comment box below to get opinion from experts.

## End Notes

Guess estimates are one of the most common case studies asked in data science interviews. With the right tools and techniques, this case study becomes a cake walk.

Did you find the article useful? Share with us any other techniques you incorporate while solving a guess estimate problem. Do let us know your thoughts about this article in the comment section below.

You can also read this article on our Mobile APP
This one is really helpful, as these type of questions are very frequent in Analyst interviews.

Great work!!!

Hi Tavish,

I liked the way you have presented your points in the article. Its very well organized and makes some interesting observations (especially about the closest proxy).

However, I must say that I disagree with you on the relevance of the guesstimate problems for an analytics professional. The main objective of the entire analytics endeavor is to take “guessing” out of the game. I am not sure what questions are being asked in Analytics Consulting interviews in India, but I am sure there are better ways to evaluate an individual’s structured approach to problem solving, his/her comfort with numbers (in a statistical sense as opposed to raw arithmetic) and his/her ability to evaluate the efficiency of different algorithms based on the problem at hand.

IMHO, Tim Peters said it best when he wrote one of the tenets in the Zen of Python:

“When faced with ambiguity, resist the temptation to guess”

Guesstimate questions have traditionally been used by Management Consulting companies to test the candidates ability to think on their feet. In contrast, all the Data Scientists/Analysts that I have met are way more critical and deliberate in their thinking. I think that is one of the most valuable traits of an analytics professional. But by all means, this is my personal opinion.

Best,

Ayush

Ayush,

Thank you for your elaborate comment. I agree with you that Guess estimates were traditionally used by Management Consulting companies. They still are equally important for them. For analytics companies in India they have become quite popular in recent past. I say this based on my experience and the conversation I had with people recruiting day in and day out in analytics.

The reason they are so popular in Indian analytics companies is that analytics is still in its nascent stage. New hires have to make their own path and influence people in industry who are still hesitant to implement strategies driven on numbers. To access such capability we need people with skills very close to a management consultant, where business problems are not very well defined and client is not very keen on accepting fact based strategy changes. Also expertise in such problems gives candidate a comfort level with segmentation, which is the heart and soul of analytics industry.

This is my perception of the Indian analytics industry. I am still open to discussion on the relevance of such case studies in analytics interviews. Talking from my personal experience, I have been asked such question in every interview I have appeared till date.

Tavish

That would make sense. If the customers are not already sold on an Alanytics project, then you would need to convince them and make them understand the relevance of the project in a language that they can understand. And I agree, this is a typical management consultant type role.

I am curious though, is there demand for more serious skills in the Indian Analytics space – something on the lines of Advanced Statistics / Machine Learning / Operations Research etc?

Also, Analytics seems to have become synonymous with “Marketing Analytics”. I am curious if there are Indian companies that have moved past this stereotype and started looking into applications of analytics that are not purely focused on marketing.

Would like to hear your thoughts.

Analytics in India can be broadly divided into two class. One class will cater to foreign markets and the other to India makets. Former does have a huge demand for advanced analytics. Later on the other hand needs more than simply analytics to make productive strategy. Here you are almost an internal consultant and lead the project from finding a fact based opportunity to the extent of implementation. More than 25% of the time in such cases goes into implementing analytics project.

Most of the analytics jobs in India are into marketing analytics. I myself have been working on the same throughout my carrier. But even in marketing analytics, we do use advanced statistics in specific projects.

Thank you for the elaborate reply Tavish. I appreciate it. I walk away wiser from this discussion 🙂

Best,

Ayush

Same here. We appreciate you following our blog.

Tavish

I guess the numbers in the cigarettes example are not what I get when I try it. I guess you meant billion (for the final number). And good job with the article.

Karthik,

You are absolutely correct,the total number of cigarettes 8.1 Billion. We will rectify the typo.

Regards,

Tavish

To solve the aircraft problem: i thought of the following:

First v should find out the total no of cities in the world

This approx happens to be 3700. On an average atleast

80 percent of the cities have an airport. So no of airports

Is 2960 or v can round it off to 3000. Assuming that every

Airport will have jus one aircraft taking off at a time

V can say that there will be 3000 Aircrafts at any given moment

I dunno if im correct. Plz let me know.

Pranjali,

You chose a good proxy. The number of cities in the world is a stable and well known number. However, the assumption that every airport has one aircraft taking off at a time is faulty. Big airports have as high as 30 run ways and hence are capable of higher number of take offs at a given time . Also, the flight that just took off is just a component of the number of flights in the air. For example, the plane which took off 10 mins before is still in the air.

Lastly,given that there are more than 1lakh aircraft all over the world, only less than 3% of planes operating at any given time looks suspicious. This gap will be filled once you take the 2 factors mentioned in this comment.

Hope this helps.

Tavish

Tavish,

Also, we must consider the journey time between the source and destination of a particular flight. For example, a flight that took off 2 hrs ago will be in the air if its journey time is greater than 2 hrs and will be landed already if its less than 2 hrs.But we would get about 15 cases if we consider this factor, which is difficult to explain in interview and such important factor should be considered. Please let me know how to handle such situation.

Thanks.

Akhil.

Akhil,

I am not very sure on what 15 cases are you talking about. But you made a good point. Obviously, you will have to take the time in air. You should segment the flights by the travel time and then make a good guess on average time in air for each of these segments. As you said, you got 15 such segments, try clubbing them so that you don’t compromise much on separating power but make your calculation easier. If you tell me your detailed solution, we can help you with simplification of the solution and our thoughts on the accuracy of the answer.

Tavish

Hello Tavish,

It means that i should have little perspective about taking off details of air crafts.If anybody ask me estimate the no. of restaurants in the world, no. of doctors in the world.

Then my approximation will always go wrong bcz i do not know how many doctors work in hospitals on an average same analogy for restaurants.

then i what i supposed to do?

Does it wrong to take assumption if i say suppose 1000 doctors work in A class hospitals?

What would be the safe side in these types of questions when i do not have any idea?

yeah , when i considered the scenario using no of countries in world and breaking it down to developed and underdeveloped economies , my estimate came out to be near around 2900 flights at any given moment . pretty right 🙂

indeed. I had completely missed out on that aspect of airplanes that may already be in air and that some cities have more than one runway. Thanks a ton Tavish. The egs that u have put forth are superb and they do help exercise ones grey cells. Looking forward to reading many such blogs from you.

Thanks.

Tavish,

I have been following the blog since long time and really appreciate you guys for the efforts that have been put in to create the content.

Coming to the point, I was going through the guess estimate for #cigarettes example.

I think the basic assumption has been that “everybody who is in the age bracket of > 22 and those between 16-22 smokes”, which is fairly wrong to assume. Taking non smokers in the same bracket could drastically reduce the number.

However, for simplicity sake, we can for sure assume that there are no smokers in the age bracket <16.

What is your take on this ?

Regards

Saurabh Kapoor

Saurabh,

Thanks for following our articles. To your question, all the numbers in the table is average cigarettes per MONTH. The bracket with highest average in the table is for age >22 urban population i.e. 30/month. For smokers this number will be much more than 30/month. In one the survey conducted on smokers while I was working for a life insurance company, we found that 25%-ile and 75%-ile in male urban population was about 20 & 70 respectively. On an average we saw on an average male urban smoke 42 cigarettes a month (both smokers and non smokers). But given that my sample was a metro city this average is a slight over estimate and hence the adjustment. This was the basis of my number estimation. If you are using some other proxy, you might get different number and neither of the answers are wrong. Most important part in such problems is your approach and not the answer.

Hope this helps.

Tavish

In the aircraft problem, let us look it this way. There are 3700 cities in world and approx 80% have airports i.e. 3000 approx.

Now considering 30% (900 airports) of these to be big to support more than one take off at a time and rest single runway airports(2100 airports).

Now. in big airport let us average the number of simultaneous take offs to 8 takeoffs every 10 mins and in small airport let it be 1 take off every 5 mins.

For big, it means 900 * 8(7200) takeoffs every 10 mins and for small it is 2100 takeoffs every 5 mins or 4200 every 10 mins. a total of 11400 takeoffs every 10 mins or 68400(68000 approx) takeofss every hour around the world.

But we need to find number of flights in air at a moment so let us take average flight time around the globe be 4 hours.

Hence there are approximately, 3 lac aircraft in air.

Rhythem,

Looks like a fair approach. A step further to this, I will change the time in air for different types of airplane/airport to be different. This might bring more accuracy to your solution.

Tavish

Hello Tanvish,

That was a good article and explanation of who to approach these kind of case studies. Is it possible for you to share more links or case studies where we can get more examples of these kind ?

Thank you.

I agree with @Rhythem…I reached almost the same approach!!!

AIRCRAFT PROBLEM:

Estimating there are 3000 major cities that has airport faciltiy. 30% are major airports 70% are minor airports. Major airports will have on average one take off per minute. Minor airports will have on average one take of for 10 minutes. Hence total aircraft taking off is 1110 per minute. Considering that each aircraft will stay on air for at least four hours, the total aircraft on air is, around 2,50,000

how would you guess the average distance that a player run in a cricket match?

Hello,

This post was published in Jan, 2014 and in December 2013 there were more than 400 million WhatsApp users worldwide(According to Statista.com). So, where is the should we make the needed adjustments?

For avg distance run by a player in a cricket match. my approach would be first take a team with 4 batsman, 4 bowlers(fast and spin), 2 all rounders and a wicket keeper. consider overall 4 batsman make 200 runs in between stumps and they also cover some distance while fielding( each fielder runs 2km on average while fielding) s, for 4 batsman total distance would be (200*44 yards + 4*2km). for 4 bowlers consider they take a avg bowling length of 15 yds, each bowler bowls 10 overs and as each player covers 2km while fielding, total distance bowlers cover would be (10*4*15 yards + 4*2km ). For 2 all-rounders, consider both make a 80 run and bowl a 5 overs each. total distance would be (80*44 yards + 5*2*15 yards + 2*2km). most probably a wicketkeeper will be a batsman and consider he makes a 30 run and while keeping he just runs 100 mts, (30*44 yds + 0.1km). so if we add all four above equations we get 33.3km and avg for a player is 3km.

Hi,

My approach for the aircraft problem is as follows:

Firstly i segmented the approx 200 countries in the world into developed,developing and underdeveloped countries. I took the respective %s to be 20%,60% and 20%, to get 40,120 and 40. Next i segmented each of these categories further into 3 segments based on the area of these countries (my logic being that an airport requires a large enough area to be built, so higher the area, higher the number of airports in a country).. So i segmented each of the 3 categories into large, mid and small countries. Now, in case of large developed countries, both the number of airports and the number of planes taking off would be higher as compared to large developing and large underdeveloped countries. Also, the proportion of large countries in both developed and underdeveloped will be significantly smaller as compared to mid and small sized countries whereas in case of developing large countries would be probably around 30-35% of the total. Plugging in values for the second segment for number of airports per country and number of flights taking off, final answer can be obtained. Another alternative second level segmentation could have been population of a country, as that would determine the domestic traffic and would significantly contribute to the final estimate as domestic flights are usually short duration flights.

I didn’t understand Whatsapp example. If percentage of population under 10 with android phone is 0%, how did you get 10,800 as the number??

Farhan,

Its just an number of significant numbers displayed concern. The percent of population under 10 with android phone is less than 1% which here is shown as 0. Else, I hope you understood the intent of the article.

Tavish

This post is really very helpful, and approach defined can be understood easily. It will be great help if you can provide me with some links where i can practice guess estimation questions.

Thanks you.

Hi Tanvish,

I am going to apply for analytics interviews. But this guess estimate part is not very class. Can you give me some more examples?? Do I have to be hands on with the data always?

I am sorry..a little mistake in my previous post. Guess Estimate part is not very clear.

Hi,

Thank you for the insightful article. It seems a level of general knowledge is required for answering such questions. The more your knowledge the more accurate the proxy.

Regarding the number of planes in the air, one needs to take the following into account the time difference between countries. Planes are not flying 24/7. If we choose an average time period of 5am till 8pm, that’s 15 hrs of flight time per day. Then taking the different gmt time zones into account and working on an average no of people who fly every hour we and the average no of people on board. Early morning flights would be packed. We then determine the based on the time the number of flights on the air.

Thank you,

Jerome

Hi,

It was helpful but what if the minimum assumption you make and the interviewer doubts on the same,

as Per month a urban men consumes 30 smokes, Justify.

I am asking this because i faced the same and i was unable to justify showing the facts as he said google can say everything.

Regards,

Kartik

I doubt whether your calculation in counting cigarettes case are correct. I mean your approach is right but I am not sure about how did you calculate all these numbers. Your 5th row, i think it should be corrected.

Thank you

hi Tavish, plz help me in more guesstimates. like number of trees in a locality?