SQL Query: Coding Question Asked by Microsoft and Facebook
This article was published as a part of the Data Science Blogathon.
Introduction
SQL proficiency is crucial for the field of data science. We’ll talk about two SQL queries that product businesses use to screen applicants for jobs as data scientists in this article. The StrataScratch website generates the SQL questions.
StataScratch is an excellent tool for anyone wishing to get started in data science and improve their SQL and Python skills. This platform offers coding questions and non-coding topics related to data science, such as statistics, probability, and so on. I strongly advise you to create an account on the StrataScratch website and practice the question along with the article. To solve this problem, I will use the Postgres SQL database.
If you know SQL well, you will stand a better chance of clearing data science interviews or dealing with day-to-day tasks efficiently. This article will focus on the approach to solving the problem. After going through this article, you would better understand how you should approach the solution for a given problem. You must read this article to improve your understanding and ways to approach solutions. Let’s get ahead to questions.

Part 1: Premium vs. Freemium Asked by Microsoft
Find the total number of downloads for paying and non-paying users by date. Include only records where non-paying customers have more downloads than paying customers. The output should be sorted by earliest date first and contain 3 columns date, non-paying downloads, and paid downloads.
Interview Question Date: November, 2020, Company: Microsoft, Difficulty-Level: Medium, Interview QuestionsID: 10300, Tables: ms_user_dimension(fields: user_id(int), acc_id(int)), ms_acc_dimension (fields: acc_id(int), paying_customer(varchar)) , ms_download_facts(fields: date(datetime), user_id(int), downloads(int)
Preview of table ms_user_dimension:

Preview of table ms_acc_dimension:

Preview of table ms_download_facts:

Approach:
Three tables are provided here. To solve the problem, we must determine the number of daily downloads made by paying and nonpaying customers. The problem can be divided into three sections. We will join all of the tables in the first section. The second section will determine the number of paid and non-paid downloads for each user. Finally, we will show records with more non-paid downloads than paid downloads.
select date, downloads,paying_customer from ms_user_dimension
inner join ms_acc_dimension on ms_user_dimension.acc_id = ms_acc_dimension.acc_id inner join ms_download_facts on ms_user_dimension.user_id = ms_download_facts.user_id;
select date, sum(case when paying_customer = 'yes' then downloads end) as paid_downloads, sum(case when paying_customer = 'no' then downloads end) as non_paid_downloads from ms_user_dimension inner join ms_acc_dimension on ms_user_dimension.acc_id = ms_acc_dimension.acc_id inner join ms_download_facts on ms_user_dimension.user_id = ms_download_facts.user_id group by date;
select date,
sum(case when paying_customer = 'no' then downloads end) as non_paid_downloads,
sum(case when paying_customer = 'yes' then downloads end) as paid_downloads
from ms_user_dimension inner join ms_acc_dimension
on ms_user_dimension.acc_id = ms_acc_dimension.acc_id
inner join ms_download_facts
on ms_user_dimension.user_id = ms_download_facts.user_id
group by date
having sum(case when paying_customer = 'no'
then downloads end) >
sum(case when paying_customer = 'yes' then downloads end)
order by date;

Part 2: Highest Energy Consumption Asked by FaceBook
This is the hard-level question asked by Facebook/Meta in one of its interviews. You can view the question here. The question name is Highest Energy Consumption. The details of the question are given below:
Highest Energy Consumption
Find the date with the highest total energy consumption from the Meta/Facebook data centers. Output the date along with the total energy consumption across all data centers.
Interview Question Date: March 2020, Company: Meta/Facebook, Difficulty-Level: Medium, Interview QuestionsID: 10064, Tables: fb_eu_energy (field: date(datetime), consumption(int)), fb_asia_energy (field: date(datetime), consumption(int)) , fb_na_energy(field: date(datetime), consumption(int))
Preview of table fb_eu_energy:

Preview of table fb_asia_energy:

Preview of table fb_na_energy:

Approach: The problem will be divided into three sections. We will combine the records from the tables in the first section. The total energy consumed each day will be determined in the second part. Finally, we must determine the date on which the most energy was consumed and return the result.
Step 1: Join Tables
As data is present among three tables, we must combine all of the records from all three tables. We can’t use the union to combine records from these three tables because there are duplicate records in fb_eu_energy and fb_na_energy. For instance, the record (2020-01-01, 400) can be found in the fb_eu_energy and fb_na_energy tables. Therefore, if we combine records using the union, it will eliminate duplicate records. Therefore we have used union all, which will contain the duplicate record also.
SELECT date, consumption FROM fb_eu_energy union all SELECT date, consumption FROM fb_asia_energy union all SELECT date, consumption FROM fb_na_energy;

Step 2: Calculate the Total Amount of Energy Consumed for Each Day
After combining all records, we will select the sum of energy consumption for each day. We can use the group on the date and take the total energy consumed for this.
select date, sum(consumption) as total_consumption from ( SELECT date, consumption FROM fb_eu_energy union all SELECT date, consumption FROM fb_asia_energy union all SELECT date, consumption FROM fb_na_energy )E group by date;

Step 3: Filter the Records and Format Result into Manner Specified
Now we must format our query result in the manner specified in the question. Across all data centers, we must output the data with the highest energy consumption. To arrange rows in descending order of total consumption, we can use the order by clause on the total consumption field. The first row will give us the date with the highest energy consumption; we can use limit 1 to output only one row for this task.
select date, sum(consumption) as total_consumption from ( SELECT date, consumption FROM fb_eu_energy union all SELECT date, consumption FROM fb_asia_energy union all SELECT date, consumption FROM fb_na_energy )E group by date order by sum(consumption) desc limit 1 ;

Conclusion
In this article, we looked at two SQL questions and how to solve them efficiently. We have seen union all, group by, having clause, case clause, filtering the rows using where clause, and how they have been used to solve the questions. When attempting to solve any complex problem, keep the following points in mind
- It is necessary to divide the problem into smaller problems. After reading the problem twice, decide which columns to use to calculate the desired result.
- Next, determine the functions you’ll need to calculate each sub-problem, and then try to connect the solutions of the sub-problems.
- Using this method, you can quickly solve any problem. Exposure to strategies for solving different questions by product companies will improve your capacity to formulate novel answers and new approaches to problems.
The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.