Understanding Subqueries in SQL
Overview
-
Be familiar with the SELECT Statement
-
Have a basic understanding of WHERE HAVING and FROM clauses
Introduction
In this article, we will deal with Subqueries in SQL. Like any structure has a subpart or a subset similarly our Query in SQL or the SELECT query in SQL also has a concept of Subquery associated with it.
We will be learning about subqueries from the scratch covering all your questions like What a Subquery Is? What are the different types of Subqueries? Why are they even required? Where can sub queries be used or with which clauses/statements we can use them etc. etc.
Don’t worry! Everything will be explained with the help of easy-to-understand examples.
We will be using MS SQL Server as our Database Server!
Table of Contents
-
Understanding the Dataset
-
What is a Subquery?
-
What is a Subquery used for?
-
Types of Subqueries in SQL
-
Ordinary Subquery
1. Subqueries with SELECT Statement
2. Subqueries with WHERE Clause
3. Subqueries with HAVING Clause
-
Inline View
-
Correlated Subquery
Understanding the Dataset
This dataset will be used for all the examples that we will be taking in this article.
We have a table named ORDERS that contains the records of the sales made in a particular period of time for various products under multiple categories.
It contains columns like: Order_Id, Category , Product, Sales
Let’s begin our learning with a situation!
We have the Table: ORDERS as –
Suppose that I want to see only those entries where the sales made is more than the average sales of all the entries received till now.
So my question to you is: Can you show the records where the sales are more than the overall average sales.
We can try something like :
SELECT * FROM ORDERS WHERE Sales > AVG(Sales)
But we get an ERROR!
“An aggregate may not appear in the WHERE clause unless it is in a subquery contained in an outer reference”
Hold on a second, What does the word subquery mean here?
It is a query within a query
We can perform this task easily using something known as a ‘SUBQUERY’
How?
We will write the query as:
SELECT * FROM ORDERS WHERE Sales > (SELECT AVG(Sales) FROM ORDERS)
SELECT AVG(Sales) FROM ORDERS is our Subquery here(also known as the inner query)
SELECT AVG(Sales) FROM ORDERS
It outputs :
This output is then used as an input for our main query(also known as the outer query)
SELECT * FROM ORDERS WHERE Sales > (SELECT AVG(Sales) FROM ORDERS)
We get the final output as :
Let’s officially define what a Subquery is :
What is a Subquery?
A Subquery is a query within another SQL query.
Note that it must always be enclosed within parentheses.
When is a Subquery used?
When we want to do some operations or perform some calculations at a different level as we did in the inner query explained above (We calculated the overall average sales)
and other operations at a different level like we did in the outer query explained above (We compared the individual sales with the overall sales for each record).
In a nutshell, When we want to perform operations at different levels then we use a Subquery.
Types of Subqueries in SQL
There are 3 types of Subqueries in SQL. They are :
-
Ordinary Subquery
-
Inline View
-
Correlated Subquery
Let’s learn each of them one by one :
Ordinary Subquery
The query that we learned at the beginning is a perfect example of an Ordinary Subquery.
Let’s take some more examples :
We have the Table: ORDERS as –
Subqueries with SELECT Statement
Question: Show the category-wise percentage of Sales done.
We will write the query as :
SELECT Category , SUM(Sales)/(SELECT SUM(Sales) FROM ORDERS) AS PERC_SALES FROM ORDERS GROUP BY Category
How does the query work?
First, the inner query gets executed.
SELECT SUM(Sales) FROM ORDERS
Then using this value the outer query executes.
FINAL OUTPUT :
Subqueries with WHERE Clause
Question: Show the records where the sales are more than the overall average sales.
We will write the query as:
SELECT * FROM ORDERS WHERE Sales > (SELECT AVG(Sales) FROM ORDERS)
How does the query work?
First, the inner query gets executed.
SELECT AVG(Sales) FROM ORDERS
It outputs :
Then using this value the outer query executes.
FINAL OUTPUT :
Subqueries with HAVING Clause
Question: Show the items where average sales for each item is more than the overall average.
We will write the query as:
SELECT Product , AVG(Sales) FROM ORDERS GROUP BY Product HAVING AVG(Sales) > (SELECT AVG(Sales) FROM ORDERS)
How does the query work?
By the time you must be knowing that: First the inner query gets executed.
SELECT AVG(Sales) FROM ORDERS
It outputs :
Then using this value the outer query executes.
FINAL OUTPUT :
Important features of ORDINARY SUBQUERIES :
-
Used with SELECT Statement, WHERE Clause, HAVING Clause, CASE WHEN Statement
-
Here first the subquery(inner query) is executed and then the main query(outer query) gets executed taking the output of the subquery.
-
The inner query works independently and can query a different table as well.
-
It must return only one column in the output (at the maximum).
Inline View
Let’s take a simple example :
To get all the records in the table we can do,
SELECT * FROM ORDERS
We can replace the table(ORDERS) here with a subquery as well (SELECT * FROM ORDERS)
This is known as an inline view.
SELECT * FROM ( SELECT * FROM ORDERS) TT
OUTPUT :
Let’s move on to a more complex question.
Question: Display the record with the 2nd lowest sales.
We will write the query as:
SELECT * FROM (SELECT *, ROW_NUMBER() OVER (ORDER BY Sales) AS row_num FROM ORDERS) AS T WHERE row_num=2
How does the query work?
By the time you must be knowing that: First, the inner query gets executed.
SELECT *, ROW_NUMBER() OVER(ORDER BY Sales) AS row_num FROM ORDERS
It outputs :
Then using this value the outer query executes. The OUTPUT table above is used as a table to retrieve the second lowest sales record.
FINAL OUTPUT :
Important features of INLINE VIEW :
-
Used with FROM Clause i.e when we write a query in the FROM Clause.
-
It is a subquery in the FROM Clause of a SELECT Statement
-
This is used when we have written some query (known as the inner query) and we want to use the output of that inner query as a table for the outer query.
-
Remember that, It is mandatory to write the table alias at the end of the FROM Clause.
-
It looks like nesting SELECT Statements.
But why do we actually need to put a SELECT Statement in the FROM Clause?
Here we are creating a column named row_num in the inner query and we want to filter it which is done with the help of the outer query.
Correlated Subquery
Let us learn this with the help of an example :
Question: Display all the products whose sales are more than the average sales in their category.
We will write the query as:
SELECT Order_Id , Product, Category FROM ORDERS ord WHERE Sales >= (SELECT AVG(Sales) FROM ORDERS WHERE Category = ord.Category)
How does the query work?
The subquery is executed once for every row of the outer query.
FINAL OUTPUT :
Important features of CORRELATED SUBQUERIES:
-
Used for the row-by-row processing
-
Used when we want the subquery to return a varied result(s) for each row considered by the main query(Outer query)
-
Operators like ANY and ALL can be used in a correlated query.
A Subquery can be used with the SELECT, INSERT, UPDATE, and DELETE statements.
Conclusion
So this is how Subqueries work and help us achieve results when we want them at different levels of calculations.
If you still have any questions on subqueries in SQL, do let me know in the comments below.
We will get on a quick chat there. To clear your SQL concepts further go follow this article.
You can connect with me on LinkedIn: https://www.linkedin.com/in/ayushi-gupta25/